CPU and System Performance

Snapdragon 820 included Qualcomm’s first fully-custom 64-bit CPU cores. The unique microarchitecture’s floating-point IPC was very good, but integer IPC was no better than ARM’s older A57 core. Its power efficiency was lower than competing cores as well. Instead of using a revised quad-core Kryo arrangement for Snapdragon 835, Qualcomm decided to go in a completely different direction.

The new Kryo 280, despite the similar name, shares no design DNA with the original Kryo. Its an octa-core, big.LITTLE configuration with four “performance” cores and four lower-power “efficiency” cores. What makes Kryo 280 unique, however, is that it’s the first design to use ARM’s new "Built on ARM Cortex Technology" (BoC) license, which allows vendors to customize ARM cores. This new semi-custom option gives vendors the ability to differentiate their products from those using ARM’s stock cores while avoiding the more costly route of creating a fully-custom design from scratch.

The BoC license allows the vendor to request certain modifications, particularly to the fetch block and issue queues, but certain parts of the microarchitecture are off limits, including the decoder and execution pipelines, because modifying these blocks requires too much effort. Qualcomm is not disclosing which ARM cores serve as the foundation for Kryo 280 or precisely which modifications it requested, but it did say that both CPU clusters use semi-custom cores. Qualcomm also confirmed that Snapdragon 835’s memory controllers are its own design.

Geekbench 4 - Integer Performance
Single Threaded
  Snapdragon 835 Snapdragon 821
(% Advantage)
Snapdragon 810
(% Advantage)
AES 905.40 MB/s 559.10 MB/s
(61.9%)
714.47 MB/s
(26.7%)
LZMA 3.13 MB/s 2.20 MB/s
(42.3%)
1.92 MB/s
(63.0%)
JPEG 16.80 Mpixels/s 21.60 Mpixels/s
(-22.2%)
12.27 Mpixels/s
(36.9%)
Canny 23.60 Mpixels/s 30.27 Mpixels/s
(-22.0%)
23.63 Mpixels/s
(-0.1%)
Lua 1.84 MB/s 1.47 MB/s
(25.2%)
1.20 MB/s
(53.3%)
Dijkstra 1.73 MTE/s 1.39 MTE/s
(24.5%)
0.91 MTE/s
(90.1%)
SQLite 53.00 Krows/s 36.67 Krows/s
(44.5%)
33.30 Krows/s
(59.2%)
HTML5 Parse 8.67 MB/s 7.61 MB/s
(13.9%)
6.38 MB/s
(35.9%)
HTML5 DOM 2.26 Melems/s 0.37 Melems/s
(510.8%)
1.26 Melems/s
(79.4%)
Histogram Equalization 52.90 Mpixels/s 51.17 Mpixels/s
(3.4%)
53.60 Mpixels/s
(-1.3%)
PDF Rendering 50.90 Mpixels/s 52.97 Mpixels/s
(-3.9%)
43.70 Mpixels/s
(16.5%)
LLVM 196.80 functions/s 113.53 functions/s
(73.3%)
108.87 functions/s
(80.8%)
Camera 5.71 images/s 7.19 images/s
(-20.6%)
4.69 images/s
(21.7%)

The Snapdragon 835’s Kryo 280 CPU shows a noticeable improvement in integer IPC relative to the 820/821’s Kryo core. This is not unexpected, however, considering integer performance was not one of Kryo’s strengths. While most workloads see large increases, there are a few regressions too, notably in JPEG, Canny, and Camera. We saw this same performance pattern from Kirin 960’s A73 CPU as well. These integer results, along with L1/L2 cache behavior, match the A73’s unique performance fingerprint, confirming that Kryo 280’s performance cores are based on ARM’s latest IP.

Quickly comparing Snapdragon 835 and Kirin 960 Geekbench 4 Integer results also shows performance variations that cannot be fully explained by differences in frequency or normal testing variance. The differences only occur in a few specific tests and range from 9% to -5%, which again is not completely unexpected given the limited number of modifications the BoC license allows for semi-custom designs.

Geekbench 4 (Single Threaded) Integer Score/MHz

The chart above divides the overall integer score by CPU frequency, making it easier to directly compare IPC. Taken as a whole, the performance of Kryo 280’s semi-custom performance core is not much different than the Kirin 960’s A73 core in this group of workloads, with individual gains and losses nearly averaging out. Its overall IPC is also only about 6% higher than A72 and 14% higher than A57. Its advantage over Snapdragon 820/821 widens to 22%, partly because Kryo’s poor performance in the LLVM and HTML5 DOM workloads drags down its overall score.

While Snapdragon 835 leads other SoCs by a slim margin in this test, it’s not a sweeping victory. Just like we saw with Kirin 960’s A73 cores, performance improves in some workloads but regresses in others.

Geekbench 4 - Floating Point Performance
Single Threaded
  Snapdragon 835 Snapdragon 821
(% Advantage)
Snapdragon 810
(% Advantage)
SGEMM 11.5 GFLOPS 12.2 GFLOPS
(-5.7%)
11.0 GFLOPS
(4.2%)
SFFT 2.9 GFLOPS 3.2 GFLOPS
(-9.7%)
2.3 GFLOPS
(25.2%)
N-Body Physics 879.6 Kpairs/s 1156.7 Kpairs/s
(-24.0%)
580.2 Kpairs/s
(51.6%)
Rigid Body Physics 6181.7 FPS 7171.3 FPS
(-13.8%)
4183.4 FPS
(47.8%)
Ray Tracing 232.6 Kpixels/s 298.7 Kpixels/s
(-22.0%)
130.1 Kpixels/s
(78.7%)
HDR 7.8 Mpixels/s 10.8 Mpixels/s
(-27.6%)
6.4 Mpixels/s
(21.9%)
Gaussian Blur 23.4 Mpixels/s 48.5 Mpixels/s
(-51.8%)
21.9 Mpixels/s
(6.7%)
Speech Recognition 13.9 Words/s 10.9 Words/s
(27.5%)
8.1 Words/s
(71.4%)
Face Detection 513.8 Ksubs/s 685.0 Ksubs/s
(-25.0%)
404.4 Ksubs/s
(27.0%)

Snapdragon 835’s Kryo 280 takes two steps backwards when running Geekbench 4’s floating-point workloads, finishing well behind Snapdragon 820/821’s Kryo core and even a little behind SoCs using the A72 core. Its IPC is on par with the Kirin 960’s A73 core, with even less variation between individual scores than we saw when running the integer workloads.

The A73’s slight performance regression relative to the A72, which also applies to the semi-custom Kryo 280, is a bit surprising, because their NEON execution units are relatively unchanged from the A72’s design. If anything, the A73’s lower-latency front end and improvements to its fetch block and memory system should give it an advantage, but that’s not the case. The A73’s narrower decode stage could limit performance for some workloads but not all. Both the Kirin 960’s A73 and Snapdragon 835’s Kryo 280 show reduced L2 cache read/write bandwidth (and lower L1 write bandwidth) relative to A72, which could also negatively impact performance.

Geekbench 4 (Single Threaded) Floating Point Score/MHz

Snapdragon 835’s floating-point IPC is 23% lower than Snapdragon 820/821’s. One has to wonder if this is the result of a forced compromise or a willing change in design philosophy. When Qualcomm started work on Kryo more than 2 years ago, it may have envisioned new workloads that never materialized. Or it could be that with more compute workloads shifting to the GPU and DSP to improve efficiency, it was willing to sacrifice some floating-point performance to save area and power.

Geekbench 4 - Memory Performance
Single Threaded
  Snapdragon 835 Snapdragon 821
(% Advantage)
Snapdragon 810
(% Advantage)
Memory Copy 4.70 GB/s 7.82 GB/s
(-39.9%)
3.99 GB/s
(17.8%)
Memory Latency 13.95 Mops/s 6.64 Mops/s
(110.1%)
4.29 Mops/s
(225.2%)
Memory Bandwidth 17.95 GB/s 13.53 GB/s
(32.7%)
7.15 GB/s
(151.0%)

The Kryo 280, A73, A72, and A57 cores all have 2 address generation units (AGUs). Unlike the A72/A57, however, which use dedicated AGUs for load and store operations, each AGU in Kryo 280/A73 is capable of performing both operations. For Kirin 960, this change, among others, reduces memory latency and significantly improves bandwidth to main system memory relative to Kirin 950.

Snapdragon 835’s memory latency and bandwidth numbers are even better than Kirin 960’s—up to 11% after accounting for differences in CPU frequency. The 835 sees impressive gains over the 820/821 too. Switching to Kryo 280 does not provide the same bandwidth boost as the switch to A73 did for Kirin 960, however, because Kryo’s 2 AGUs were already capable of performing both load and store operations, albeit with a higher latency in some cases.

System Performance

So far our initial results show Snapdragon 835’s Kryo 280 is a big.LITTLE combination of semi-custom A53 and A73 CPU cores, whose integer and floating-point IPC is similar to Kirin 960. System-level tests like PCMark, which includes several realistic workloads that stress the CPU, GPU, RAM, and NAND storage using standard Android API calls, are affected by more than just CPU IPC and memory latency, however. Device OEMs tune the software parameters that control the scheduler and DVFS systems to achieve the desired balance between performance and battery life, to meet quality of service goals, and to stay within a particular design's thermal limits.

No doubt we'll see performance vary among the upcoming Snapdragon 835 devices, just like we do with other SoCs, but for now we see Qualcomm’s 835 MDP/S with the top overall score in PCMark, just barely ahead of the Mate 9 and its Kirin 960 SoC. It’s also 23% faster overall than the top-performing Snapdragon 821 phone.

PCMark - Work 2.0 Performance Overall

PCMark - Web Browsing 2.0

PCMark - Writing 2.0

PCMark - Data Manipulation 2.0

The Snapdragon 835 MDP/S performs well in the Web test, although its advantage over the Mate 9 is only 10%. Its performance lead over the Snapdragon 820/821 phones, which all fall behind SoCs using ARM’s A72 and A73 CPUs, grows to 34% in this integer-heavy test.

The PCMark Writing test generates frequent, short bursts of activity on the big CPU cores while performing a variety of operations, including PDF processing and file encryption (both integer workloads), memory operations, and even reading and writing some files to internal NAND. Because of this, it tends to produce the most varied results. Take the spread between the Snapdragon 820/821 phones, for example, where the LeEco Le Pro3 is 40% faster than the Galaxy S7 edge. The performance difference between the Snapdragon 835 MDP/S and Mate 9 is negligible, however. Comparing Snapdragon 835 to older members of the Snapdragon family reveals more significant differences; it’s 24% faster than the LeEco Le Pro3 (S821), 80% faster than the Nexus 6P (S810), and 162% faster than the Lenovo ZUK Z1 (S801AC).

The PCMark Data Manipulation test is another primarily integer workload that measures how long it takes to parse chunks of data from several different file types and then records the frame rate while interacting with dynamic charts. Once again the Snapdragon 835 MDP/S and Mate 9 deliver similar performance, but they separate themselves a little further from the pack. Like we saw in the Writing test, the phones using Snapdragon 820 show significant performance variation, providing another example of how OEM tinkering impacts the user experience. The Snapdragon 835 MDP/S outperforms the Pixel XL by 28% and the LG G5 by 111%.

PCMark - Video Editing 2.0

PCMark - Photo Editing 2.0

The Video Editing test, which uses OpenGL ES 2.0 fragment shaders for applying video effects, actually presents a very light load to the system. After monitoring the behavior of several phones while running this test, I’ve noticed that GPU frequency remains close to idle and most phones do not migrate threads to the big CPU cluster, using the little A53 cluster exclusively, which is why we see very little performance variation in this test.

The Photo Editing test applies a number of different photo effects and filters with both the CPU and GPU. The Snapdragon 835 MDP/S and the phones using Snapdragon 820/821 rise to the top of the chart thanks to their Adreno GPU’s strong ALU performance. The 835’s Adreno 540 GPU helps it perform 33% better than the highest performing phone with an ARM GPU, the Mate 9 and its Mali-G71.

Kraken 1.1 (Chrome/Safari/IE)

WebXPRT 2015 (Chrome/Safari/IE)

JetStream 1.1 (Chrome/Safari)

Yes, the iPhones perform well in these JavaScript tests. No, you cannot use these tests to compare IPC between Apple’s A-series SoCs and those found in Android phones, because they are running different browsers. A significant portion of the iPhones’ performance advantage actually comes from Safari’s JavaScript engine.

The Snapdragon 835 MDP/S compares favorably to other phones using the Chrome browser (all of the phones are using the latest version). It joins the Snapdragon 820/821 phones at the top of the chart in Kraken, although, its performance is no different. It essentially matches the Mate 9 in JetStream too, but pulls ahead of the Snapdragon 820/821 phones by 15% to 37%. Performance is unexpectedly good in WebXPRT 2015 where it pulls ahead of the Mate 9 by 24% and up to 67% over the Galaxy S7 (S820).

As an additional point of interest, and to further highlight the software layer’s effects, we also ran these tests using Qualcomm’s internally developed browser that’s optimized for Snapdragon SoCs. Kraken only sees a modest improvement to 2,305 ms, but JetStream improves by 24% to 87 and WebXPRT 2015 jumps to 280, an 82% improvement.

Introduction GPU Performance
POST A COMMENT

128 Comments

View All Comments

  • niva - Thursday, March 23, 2017 - link

    You keep saying that it's not "real world" when earlier there were links provided that should be showing you that in the real world, today, multithread already matters, and having more real/virtual cores helps. This is all for the simplest and most used task for cell phones, web browsing, multi threading is quite useful. I'm fairly confident that if an Android manufacturer ran on hardware identical to the iPhone and outscored it across the board, you'll buy the iPhone anyways. Good for you but you're not helping in this discussion, just admit your apple fanboyism and bow out. Why do you even care about the SD 835? Reply
  • yankeeDDL - Thursday, March 23, 2017 - link

    Let me be more specific then.
    Web browsing is a key part of mobile experience. In the Kraken, WebXPRT and JetStream the performance difference is stunning: Kraken: 2.4X faster; WebXPRT: 1.35X faster, JetStream: 2.4X faster.
    Yes, the difference is not only due to HW, but also to code optimization. Still: damn!

    In the GPU department: in GFXBench, the performance is on-par (when reported).
    There's a noticeable advantage of the S835 in 3DMark (1.4X Overall), but in Basemark we loose again by nearly 2X.

    Yes, the comparison is (a bit) apples to oranges, but one has to admit that for a brand new SoC it would make sense to expect an hands-down victory over a noticeably older phone.
    Reply
  • BurntMyBacon - Thursday, March 23, 2017 - link

    @yankeeDDL: "Still: damn!"
    Agreed, Apple has a massive advantage in javascript benchmarks. It is impossible to say how much (if any) of that is due to the SoC vs the software stack, but the advantage is undeniable.

    It is not unexpected that the A10 would win in Basemark. The A10 is making use of a low level API (Metal) where the SD835 is using a high level API (OpenGL ES). Again, Apple's better software cohesion and better use of APIs benefits them here. Still, the difference is quite formidable and the SD835 actually looses to the Kirin 960 as well. It would seem that the Adreno 540 is not well suited to this workflow. Therefore, it is unlikely that use of Vulkan will suddenly propel them ahead, but the gap would be a lot smaller. By the time use of Vulkan becomes common place, A11(?) will be out, so it's really a moot point.

    The GFXBench Car Chase ES 3.1 / Metal chart title suggests it should have an Apple data point (only user of "Metal"). It'll likely show the same thing as the basemark test given the disparate APIs, but I'm still curious (though not critical without further considerations) as to why it wasn't included.

    The fact that you can't get an A10 without iOS and you can't get iOS on another companies SoC makes considerations about whether it is better than the android SoCs or not a tertiary concern and academic when compared to the overall platform experience. There are plenty of reasons not to like an iProduct. Performance isn't generally one of them.
    Reply
  • tuxRoller - Friday, March 24, 2017 - link

    If you take into account Qualcomm's optimized browser, the differences relative to the iPhone 7 change to:
    (% better than the sd835)
    Kraken: 140 -> 106 webxprt: 35 -> -26 jetstream: 140 -> 92

    I'm sure they could do more, but I'd be amazed if the remaining differences in kraken & jetsteam were mostly due to software.
    Reply
  • Despoiler - Wednesday, March 22, 2017 - link

    It's mostly the OS that Apple has superiority in. That's why they can use a dual core while Android phones are quad or octacore. Reply
  • grayson_carr - Saturday, March 25, 2017 - link

    Isn't the A10 a quad core, or more correctly, a dual dual core big.LITTLE chip? Same as Snapdragon 820? Reply
  • akdj - Friday, March 31, 2017 - link

    Yes, A10 is a quad-core big.LITTLE SoC, w/a 12-core GPU, I believe... as well, Apple on the 7+ added another GB of RAM = 3GB on an iOS phone, iPad 12.9" has 4GB -- but the iOS integration with the A10... as well, the last several generations of 'home brewed' ARM chips - and Apple's investments in silicon engineering from nVidia, Intel, AMD, Qualcomm, TI and others has paid dividends.

    That said, off Apple/AX chipsets for a second.... excellent 'first look' and factory/testing insight. That is very cool stuff!

    I think, as geeks, and 'passionate' groups of faithful mobile phone OS folks amongst our population, folks who take this stuff more seriously than the Sunday sermon... we should all take a breath and remember that it's a 'chip preview'
    Not an Android phone!

    The issues with using the same chip on every device running Android --and every OEM 'skinning' their handset is a huge contributor to the varying performances; real world or objective bench tests. Like Windows as an OS on the desk and lap over the years, we've ALL had our 'Vista' moments. I'm an OS X/macOS user specifically because of 'vista and a curiosity about OS X in 2006 --- but again, I digress...

    Qualcomm has built a chip able to be put in to every flagship other than iOS this year and 'compete' just fine. In the performance metrics all above are bickering about. But as an iPhone 7+ owner/lover (it's an excellent phone) -- my appreciation for the 835 goes well beyond its parity or near ...or exceeding metrics of CPU and GPU, they've built a gigabit LTE modem (who cares if you won't 'get that' - it's still gonna haul ass!) - incredible image processing and 'encryption/protection' with its iris scanning and biometric uses ...as well as the smaller node, the AIO model with all,parts of the 'brain' build in house --- IMHO, it's a 50-50 tie between chip engineering but I'm bias as an ambidextrous user since '07/'08 (iOS/Android) - I have always had one of each, the family is iOS and since switching everyone over, my workload has decreased 95%. It's vertical and horizontal integration and aggregation with macOS is, still to me, science fiction and for the family business... a God Send.

    That said --- my S6 has an Exynos (sp?) processor, Note 4 was Qualcomm and, as I skip Android gens, my next will be a Qualcomm. I know as an iPhone 7+ owner I was delighted to learn that the model I bought has a Qualcomm modem, not the Intel;)

    Special trip for you guys. Great write up and truly amazing to me ...I'm 45, born with the 8086 processor and the progress mankind has made in such a tiny package, which is high speed connected with exponentially more power than just a decade ago... in our pocket. We all need to remember between our friendly iOS/Android 'disputes' -- the special world we enjoy today specifically BECAUSE Apple and Google/Qualcomm/SnapDragons and their host of OEMs building what just a decade ago meant 110v, plugged in No mobility, significantly slower - even wired connectivity. None of the 'Millions' of free, $1, $5 & ten buck 'programs/software' then, apps, now - available on demand! Over 30 million song libraries, endless knowledge and tools, true magic is what I think the SD835 A10 Fusion and their predecessors are/were.
    I'm old now, but not compared with the mountains I live in -- lucky enough to have spent the second ½ of my life quite literally watching these chips come to fruition ...I think it's the A10 when announced... it had/has over 3 billion transistors... and the SoC's the size of our fingernail!

    Screw arguing. It's a competitive world and WE are the beneficiaries!
    Reply
  • edlee - Wednesday, March 22, 2017 - link

    the 835 has a 10% stronger gpu than a10, its just nuts that apple, not being a cpu designer at heart, can design a better cpu/soc that is years ahead of of what arm and qualcomm can produce Reply
  • BedfordTim - Wednesday, March 22, 2017 - link

    There is a price/performance trade off with processors. Apple has chosen to make a much bigger processor which is why it is faster. Think Atom vs Core. One is slow but cheap and one is expensive and fast.
    Apple are not "years ahead". They have chosen to spend more on the processor.
    Reply
  • Lord-Bryan - Wednesday, March 22, 2017 - link

    "They have chosen to spend more on the processor"
    They also had 64bit arm cores 2 years before Qualcomm released theirs, And that's is why they are year's ahead in performance and power efficiency
    Reply

Log in

Don't have an account? Sign up now