iPad 4 GPU Performance Analyzed: PowerVR SGX 554MP4 Under the Hood
by Anand Lal Shimpi on November 2, 2012 1:46 PM ESTAs always, our good friends over at Kishonti managed to have the first GPU performance results for the new 4th generation iPad. Although the new iPad retains its 2048 x 1536 "retina" display, Apple claims a 2x improvement in GPU performance through the A6X SoC. The previous generation chip, the A5X, had two ARM Cortex A9 cores running at 1GHz paired with four PowerVR SGX 543 cores running at 250MHz. The entire SoC integrated 4 x 32-bit LPDDR2 memory controllers, giving the A5X the widest memory interface on a shipping mobile SoC in the market at the time of launch.
The A6X retains the 128-bit wide memory interface of the A5X (and it keeps the memory controller interface adjacent to the GPU cores and not the CPU cores as is the case in the A5/A6). It also integrates two of Apple's new Swift cores running at up to 1.4GHz (a slight increase from the 1.3GHz cores in the iPhone 5's A6). The big news today is what happens on the GPU side. A quick look at the GLBenchmark results for the new iPad 4 tells us all we need to know. The A6X moves to a newer GPU core: the PowerVR SGX 554.
Mobile SoC GPU Comparison | |||||||||||
PowerVR SGX 543 | PowerVR SGX 543MP2 | PowerVR SGX 543MP3 | PowerVR SGX 543MP4 | PowerVR SGX 554 | PowerVR SGX 554MP2 | PowerVR SGX 554MP4 | |||||
Used In | - | iPad 2 | iPhone 5 | iPad 3 | - | - | iPad 4 | ||||
SIMD Name | USSE2 | USSE2 | USSE2 | USSE2 | USSE2 | USSE2 | USSE2 | ||||
# of SIMDs | 4 | 8 | 12 | 16 | 8 | 16 | 32 | ||||
MADs per SIMD | 4 | 4 | 4 | 4 | 4 | 4 | 4 | ||||
Total MADs | 16 | 32 | 48 | 64 | 32 | 64 | 128 | ||||
GFLOPS @ 300MHz | 9.6 GFLOPS | 19.2 GFLOPS | 28.8 GFLOPS | 38.4 GFLOPS | 19.2 GFLOPS | 38.4 GFLOPS | 76.8 GFLOPS |
As always, Imagination doesn't provide a ton of public information about the 554 but based on what I've seen internally it looks like the main difference between it and the 543 is a doubling of the ALU count per core (8 Vec4 ALUs per core vs. 4 Vec4). Chipworks' analysis of the GPU cores helps support this: "Each GPU core is sub-divided into 9 sub-cores (2 sets of 4 identical sub-cores plus a central core)."
I believe what we're looking at is the 8 Vec4 SIMDs (each one capable of executing 8+1 FLOPS). The 9th "core" is just the rest of the GPU including tiler front end and render backends. Based on the die shot and Apple's performance claims it looks like there are four PowerVR SGX554 cores on-die, resulting in peak theoretical performance greater than 77 GFLOPS.
There's no increase in TMU or ROP count per core, the main change between the 554 and 543 is the addition of more ALUs. There are some more low level tweaks which helps explain the different core layout from previous designs, but nothing major.
With that out of the way, let's get to the early performance results. We'll start with low level fill rate and triangle throughput numbers:
Fill rate goes up by around 15% compared to the iPad, which isn't enough to indicate a huge increase in the number of texture units on the 554MP4 vs. the 543MP4. What we may be seeing here instead are benefits from higher clocked GPU cores rather than more texture units. If this is indeed the case it would indicate that the 554MP4 changes the texture to ALU ratio from what it was in the PowerVR SGX 543 (Update: this is confirmed). The data here points to a GPU clock at least 15% higher than the ~250MHz in the 3rd generation iPad.
Triangle throughput goes up by a hefty 65%, these are huge gains over the previous generation iPad.
The fragment lit triangle test starts showing us close to a doubling of performance at the iPad's native resolution.
Throw in a more ALU heavy workload and we really start to see the advantage of the new GPU: almost double the performance in Egypt HD at 2048 x 1536. We also get performance that's well above 30 fps here on the iPad at native resolution for the first time.
Normalize to the same resolution and we see that the new PowerVR graphics setup is 57% faster than even ARM's Mali-T604 in the Nexus 10. Once again we're seeing just about 2x the performance of the previous generation iPad.
Vsync bound gaming performance obviously won't improve, but the offscreen classic test gives us an idea of how well the new SoC can handle lighter workloads:
For less compute bound workloads the new iPad still boasts a 53% performance boost over the previous generation.
Ultimately it looks like the A6X is the SoC that the iPad needed to really deliver good gaming performance at its native resolution. I would not be surprised to see more game developers default to 2048 x 1536 on the new iPad rather than picking a lower resolution and enabling anti-aliasing. The bar has been set for this generation and we've seen what ARM's latest GPU can do, now the question is whether or not NVIDIA will finally be able to challenge Imagination Technologies when it releases Wayne/Tegra 4 next year.
113 Comments
View All Comments
djgandy - Monday, November 5, 2012 - link
Hilarious. PowerVR is the most energy efficient and has the least thermal issues. Unlike Tegra, that over heats. Adreno that has to throttle because it gets too hot.Mali, Tegra, and Adreno together are why Samsung has to put 9W/hr batteries in their phones and Apple only needs 6W/hr.
KitsuneKnight - Friday, November 2, 2012 - link
It's quite disappointing indeed. Fortunately, the Adreno 320 (at least in the Optimus G, not the Nexus 4) appears to actually be pretty damn good (unlike the new Mali), so maybe they'll be a bit of competition on the GPU front... maybe we'll have a nice GPU battle raging (in addition to the CPU battle) by the time Intel arrives in full force to the mobile landscape.Or maybe it's just a fluke. Even in flagship Android phones, it seems like the manufacturers aren't really taking things all that seriously (such as the bajillion different SoCs in the SGS3).
AnotherHariSeldon - Saturday, November 3, 2012 - link
Intel and Samsung use IMG IP as do Mediatek - The fastest growing global smartphone SoC manufacturer.http://www.eetimes.com/electronics-news/4391817/Me...
Samsung uing IMG in the form of TI omap - http://www.slashgear.com/samsung-unveils-galaxy-pr...
iwod - Friday, November 2, 2012 - link
The A15 is approaching needed Desktop computing performance. Where are we in terms of Graphics performance?Say A57 is an Core2Duo Class CPU, ( Not a Fact, i am just guessing and giving examples here )
What is an PowerVR SGX 554 MP4? Ivy Bridge G3000? Radeon 6370?
Zodiark1593 - Saturday, November 3, 2012 - link
A 6370 (80 stream processors) approaches roughly 130-140 Gigaflops at 750 MHz, not including specific optimizations on either part, I'd say the GPU performance should be roughly half, maybe slightly less so.Considering many PC games are made with much stronger GPUs in mind (such GPUs rate 800+ Gflops), I'd estimate visuals to be worse with a 6370 on average than a high end, well optimized game made for the SG554 MP4.
And then, consider that the 6370 is a little over half as powerful as the Xbox 360 GPU (similar compute power to the 6450), Tablets still have a little ways to go before hitting console performance levels, not counting any additional quirks like the eDRAM in Xenos.
What I would love to see though is AMD getting into the mobile GPU game as well. Even an 80 SP Radeon part in the 250 MHz speed would shake things up.
ananduser - Saturday, November 3, 2012 - link
AMD was in the mobile GPU game. They sold their division to Qualcomm, due to financial difficulties. It's was called Imageon and Qualcomm rebranded it to Adreno.Penti - Saturday, November 3, 2012 - link
It's roughly equivalent to HD2000 or HD2500 graphics from Intel, if we only go by GFLOPS. It's slower then the average integrated gpu now days. Or about as fast as a GeForce 210, 310 and a third of a GT620M notebook chip. Roughly that is in the ballpark of an old X1600 or X800/X850 graphics card. It's close to a integrated Radeon HD6320 and HD6310 that's in the AMD Zacate APU E-350 and E-450. Here it will largely depend on other factors and drivers though. Mobile GPU's will always be bandwidth limited so it doesn't make much sense to put in too large GPU in here, that is also why Apple uses quad-channel memory for their higher performing chips.krumme - Saturday, November 3, 2012 - link
Apu -350/450 is what? around 70mm2 on 40nm, and this a6x is 130mm2 on 28/32nmIf performance is around the same for the gpu part / and say the cpu also, it mean the old generation bobcat is at least 4 times as effective for perf / mm2. Ofcourse much less at lower voltages.
Efficiency will improve very nice with the new jaguar with cgn.
I have a doubt there will be that much difference for power/perf. If true, amd/intel still have a huge leg up on designing cpu/apus and especially drivers for the gpu side (amd).
We need some comparisons of hardware on the same software platform.
All this comparisons on different software is nice, but its a mess to evaluate the hardware if not within same family and platform. Its very difficult to do just proper benchmarking here.
Look how sgx perform on the Intel platform, the 32nm atoms. Its pathetic, it doesnt even work most of the time.
Zodiark1593 - Saturday, November 3, 2012 - link
We can't go by flops alone, though that's the only comparison I've seen.The 6370 is limited to a single 64 bit bus, so it's advantage against the IPad 4 dwindles sharply, especially if equipped with DDR3 instead of GDDR3. 4 ROPs and 8 Texture units finish off the specs. TDP comes in at roughly 15 watts.
Apart from raw shader power and clock speed, there isn't a whole heck of a lot in the 6370's favor vs mobile GPUs.
Penti - Sunday, November 4, 2012 - link
I know we can't. It's all about the capabilities and chip-specific strengths, drivers and whatever they use to overcome the limited bandwidth. Scaling down geforce and amd gpus into these sizes is just horrible. They require much more bandwidth to perform well. AMDs old tile-based line which they sold off, used by Qualcomm does quiet well though.