iPad 4 GPU Performance Analyzed: PowerVR SGX 554MP4 Under the Hood
by Anand Lal Shimpi on November 2, 2012 1:46 PM ESTAs always, our good friends over at Kishonti managed to have the first GPU performance results for the new 4th generation iPad. Although the new iPad retains its 2048 x 1536 "retina" display, Apple claims a 2x improvement in GPU performance through the A6X SoC. The previous generation chip, the A5X, had two ARM Cortex A9 cores running at 1GHz paired with four PowerVR SGX 543 cores running at 250MHz. The entire SoC integrated 4 x 32-bit LPDDR2 memory controllers, giving the A5X the widest memory interface on a shipping mobile SoC in the market at the time of launch.
The A6X retains the 128-bit wide memory interface of the A5X (and it keeps the memory controller interface adjacent to the GPU cores and not the CPU cores as is the case in the A5/A6). It also integrates two of Apple's new Swift cores running at up to 1.4GHz (a slight increase from the 1.3GHz cores in the iPhone 5's A6). The big news today is what happens on the GPU side. A quick look at the GLBenchmark results for the new iPad 4 tells us all we need to know. The A6X moves to a newer GPU core: the PowerVR SGX 554.
Mobile SoC GPU Comparison | |||||||||||
PowerVR SGX 543 | PowerVR SGX 543MP2 | PowerVR SGX 543MP3 | PowerVR SGX 543MP4 | PowerVR SGX 554 | PowerVR SGX 554MP2 | PowerVR SGX 554MP4 | |||||
Used In | - | iPad 2 | iPhone 5 | iPad 3 | - | - | iPad 4 | ||||
SIMD Name | USSE2 | USSE2 | USSE2 | USSE2 | USSE2 | USSE2 | USSE2 | ||||
# of SIMDs | 4 | 8 | 12 | 16 | 8 | 16 | 32 | ||||
MADs per SIMD | 4 | 4 | 4 | 4 | 4 | 4 | 4 | ||||
Total MADs | 16 | 32 | 48 | 64 | 32 | 64 | 128 | ||||
GFLOPS @ 300MHz | 9.6 GFLOPS | 19.2 GFLOPS | 28.8 GFLOPS | 38.4 GFLOPS | 19.2 GFLOPS | 38.4 GFLOPS | 76.8 GFLOPS |
As always, Imagination doesn't provide a ton of public information about the 554 but based on what I've seen internally it looks like the main difference between it and the 543 is a doubling of the ALU count per core (8 Vec4 ALUs per core vs. 4 Vec4). Chipworks' analysis of the GPU cores helps support this: "Each GPU core is sub-divided into 9 sub-cores (2 sets of 4 identical sub-cores plus a central core)."
I believe what we're looking at is the 8 Vec4 SIMDs (each one capable of executing 8+1 FLOPS). The 9th "core" is just the rest of the GPU including tiler front end and render backends. Based on the die shot and Apple's performance claims it looks like there are four PowerVR SGX554 cores on-die, resulting in peak theoretical performance greater than 77 GFLOPS.
There's no increase in TMU or ROP count per core, the main change between the 554 and 543 is the addition of more ALUs. There are some more low level tweaks which helps explain the different core layout from previous designs, but nothing major.
With that out of the way, let's get to the early performance results. We'll start with low level fill rate and triangle throughput numbers:
Fill rate goes up by around 15% compared to the iPad, which isn't enough to indicate a huge increase in the number of texture units on the 554MP4 vs. the 543MP4. What we may be seeing here instead are benefits from higher clocked GPU cores rather than more texture units. If this is indeed the case it would indicate that the 554MP4 changes the texture to ALU ratio from what it was in the PowerVR SGX 543 (Update: this is confirmed). The data here points to a GPU clock at least 15% higher than the ~250MHz in the 3rd generation iPad.
Triangle throughput goes up by a hefty 65%, these are huge gains over the previous generation iPad.
The fragment lit triangle test starts showing us close to a doubling of performance at the iPad's native resolution.
Throw in a more ALU heavy workload and we really start to see the advantage of the new GPU: almost double the performance in Egypt HD at 2048 x 1536. We also get performance that's well above 30 fps here on the iPad at native resolution for the first time.
Normalize to the same resolution and we see that the new PowerVR graphics setup is 57% faster than even ARM's Mali-T604 in the Nexus 10. Once again we're seeing just about 2x the performance of the previous generation iPad.
Vsync bound gaming performance obviously won't improve, but the offscreen classic test gives us an idea of how well the new SoC can handle lighter workloads:
For less compute bound workloads the new iPad still boasts a 53% performance boost over the previous generation.
Ultimately it looks like the A6X is the SoC that the iPad needed to really deliver good gaming performance at its native resolution. I would not be surprised to see more game developers default to 2048 x 1536 on the new iPad rather than picking a lower resolution and enabling anti-aliasing. The bar has been set for this generation and we've seen what ARM's latest GPU can do, now the question is whether or not NVIDIA will finally be able to challenge Imagination Technologies when it releases Wayne/Tegra 4 next year.
113 Comments
View All Comments
AnotherHariSeldon - Saturday, November 3, 2012 - link
Just look at amount of die area dedicated to the GPU's in the A6Xhttp://www.chipworks.com/blog/recentteardowns/2012...
I expect Samsung will have to revert to IMG rogue (from current ARM Mali) to remain competitive with Apple in the next iteration of product launches.
Expecting to see ray-tracing tech becoming a factor as well...............
http://www.itproportal.com/2012/06/16/imagination-...
UpSpin - Saturday, November 3, 2012 - link
As long as we don't know how the Exynos 5 Dual looks like, it's hard to say that Mali is worse than PowerVR. What we know is that the Exynos 5 Dual outperforms A5X and A6 in 'real world' gaming benchmarks. But gets beaten by the A6X.I doubt that the Mali T604 cores occupy as much die space on the Exynos 5 Dual as the PowerVR in the A6X do.
I haven't found anything about the Exynos 5 Dual die size, transistor count, or some other sort of analysis. As long as we don't have those informations you can't judge about the GPU.
Tangey - Monday, November 5, 2012 - link
5250 has not been implemented in a phone, so we have no idea what its performance is relative to the phone-only A6.I would not be surprised if a phone based 5250 clocks substantially lower than its tablet implementation.
Krysto - Saturday, November 3, 2012 - link
It seems to be Egypt HD is the only benchmark that matters, because it's a complete graphics test at a high resolution. The others are only testing for specific stuff, which even if they have higher numbers, might be bottlenecked by other components in the system, so it could be irrelevant that it scores 1000 more points over another chip.Running Egypt HD is exactly like running a game. So then it seems Mali T604 is about 30% faster than the A5X GPU and the A6 GPU (iPhone 5), and about 35% slower than the A6X GPU (65% of 554MP4). If Apple doesn't come out with iPad 5 in spring (which would be pretty crazy if they did so soon), then I expect Mali T624 or T628 to take the crown again in the first half of next year.
But that's just for raw performance. In terms of energy efficiency, Mali T604 seems to be about 30% more efficient than 554MP4, so normalized for energy efficiency, which Samsung went for here, because they wanted to use a smaller battery than Apple, to help undercut the iPad by $100, then 554MP4 is only about 10% faster than Mali T604 at the same power consumption level - maybe less than that.
AnotherHariSeldon - Saturday, November 3, 2012 - link
"In terms of energy efficiency, Mali T604 seems to be about 30% more efficient than 554MP4, so normalized for energy efficiency."I'm not sure you're quite understanding that right ;)
djgandy - Monday, November 5, 2012 - link
Do you work for ARM? Ex Nvidia employee maybe? Seem to have a case of nonsense diarrhea .qwerty0722 - Sunday, November 4, 2012 - link
I think the result is really great, but as we can see almost the iOS platform score higher than Android platform so if we put the S4 Pro on iOS it could just like a SR-71, but we use A6X in Android it could result a horrible speed (like Ford GT Mustang etc.)I wanna say is when we saw different platform put into a same test, here we can realize : it's not A platform faster than B platform but they're just different platform, maybe iOS is a great environment for OpenGL, Android maybe not.
In real world when you play a game, the resolution is higher than the iOS (1136x640 or 2048x1536), Android (2560x1600 or 1920x1200 or 1280x768 or 1280x720)
it is also smooth in Android, so I think this test is just like use the same software like Resident Evil 6 --> PS3 vs XBOX360 ( THEY ALL PERFORM WELL !! )
djgandy - Monday, November 5, 2012 - link
Drivers may be better on one platform yes. However that is often what benchmarks are for. They are much simpler than games and can therefore target specific parts of the GPU. Fill rate is a metric which can be easily calculated with a piece of paper.If the platform A is missing its expected fill with Chip A but platform B is hitting it then you have a driver issue on platform A. I don't think we've seen much evidence of this though. Android benchmarks generally delivery expected performance, look at SGX540 (as old as it is), the results are inline with the capabilities of that chip.
qwerty0722 - Monday, November 5, 2012 - link
ok thanks for your reply, but I still don't get it ; )Did you mean SGX554MP4 really is a high performance GPU to make such a difference not just the driver of iOS ?
because my personal view is why Apple use dual-core+SGX543MP3/SGX554MP4 can run this benchmark pretty far from Android (Quad-core+Adreno320), is this architecture's problem ? or..?
can you make your point more simple (sorry for my comprehension)
and thanks again for your reply~
ShAdOwXPR - Monday, November 5, 2012 - link
What's the A6X CPU gflops? and combined? And will the next gen Apple CPU/GPU catch up to current consoles? (P.S. I know next-gen consoles will be 600gflops-1.5tflops)I found a few consoles gflops cpu/gpu numbers;
Xbox | CPU: 1.5 GFLOPS | GPU: 5.8 GFLOPS | Combined: 7.3 GFLOPS
Xbox360 | CPU: 115 GFLOPS | GPU: 240 GFLOPS | Combined: 355 GFLOPS
Dreamcast | CPU: 1.4 GFLOPS | GPU: 0.1 GFLOPS | Combined: 1.5 GFLOPS
Wii | CPU: 60 GFLOPS | GPU: 1 GFLOPS | Combined: 61 GFLOPS
PS2 | CPU: 6 GFLOPS | GPU: 0 GFLOPS | Combined: 6 GFLOPS
iPad 4 | CPU: --- GFLOPS | GPU: 78 GFLOPS | Combined: --- GFLOPS