Apple Announces M1 Pro & M1 Max: Giant New Arm SoCs with All-Out Performanceby Andrei Frumusanu on October 18, 2021 4:00 PM EST
- Posted in
- Apple M1
- Apple M1 Pro
- Apple M1 Max
Today’s Apple Mac keynote has been very eventful, with the company announcing a new line-up of MacBook Pro devices, powered by two different new SoCs in Apple’s Silicon line-up: the new M1 Pro and the M1 Max.
The M1 Pro and Max both follow-up on last year’s M1, Apple’s first generation Mac silicon that ushered in the beginning of Apple’s journey to replace x86 based chips with their own in-house designs. The M1 had been widely successful for Apple, showcasing fantastic performance at never-before-seen power efficiency in the laptop market. Although the M1 was fast, it was still a somewhat smaller SoC – still powering devices such as the iPad Pro line-up, and a corresponding lower TDP, naturally still losing out to larger more power-hungry chips from the competition.
Today’s two new chips look to change that situation, with Apple going all-out for performance, with more CPU cores, more GPU cores, much more silicon investment, and Apple now also increasing their power budget far past anything they’ve ever done in the smartphone or tablet space.
The M1 Pro: 10-core CPU, 16-core GPU, 33.7bn Transistors in 245mm²
The first of the two chips which were announced was the so-called M1 Pro – laying the ground-work for what Apple calls no-compromise laptop SoCs.
Apple started off the presentation with a showcase of the packaging, there the M1 Pro is shown to continue to feature very custom packaging, including the still unique characteristic that Apple is packaging the SoC die along with the memory dies on a single organic PCB, which comes in contrast to other traditional chips such as from AMD or Intel which feature the DRAM dies either in DIMM slots, or soldered onto the motherboard. Apple’s approach here likely improves power efficiency by a notable amount.
The company divulges that they’ve doubled up on the memory bus for the M1 Pro compared to the M1, moving from a 128-bit LPDDR4X interface to a new much wider and faster 256-bit LPDDR5 interface, promising system bandwidth of up to 200GB/s. We don’t know if that figure is exact or rounded, but an LPDDR5-6400 interface of that width would achieve 204.8GB/s.
In a much-appreciated presentation move, Apple actually showcased the die shots of both the M1 Pro and M1 Max, so we can have an immediate look at the chip’s block layout, and how things are partitioned. Let’s start off with the memory interfaces, which are now more consolidated onto two corners of the SoC, rather than spread out along two edges like on the M1. Because of the increased interface width, we’re seeing quite a larger portion of the SoC being taken up by the memory controllers. However, what’s even more interesting, is the fact that Apple now apparently employs two system level cache (SLC) blocks directly behind the memory controllers.
Apple’s system level cache blocks have been notable as they serve the whole SoC, able to amplify bandwidth, reduce latency, or simply just save power by avoiding memory transactions going off-chip, greatly improving power efficiency. This new generation SLC block looks quite a bit different to what we’ve seen on the M1. The SRAM cell areas look to be larger than that of the M1, so while we can’t exactly confirm this right now, it could signify that each SLC block has 16MB of cache in it – for the M1 Pro that would mean 32MB of total SLC cache.
On the CPU side of things, Apple has shrunk the number of efficiency cores from 4 to 2. We don’t know if these cores would be similar to that of the M1 generation efficiency cores, or if Apple adopted the newer generation IP from the A15 SoC – we had noted that the new iPhone SoC had some larger microarchitectural changes in that regard.
On the performance core side, Apple has doubled things up to 8 cores now. Apple’s performance cores were extremely impressive on the M1, however were lagging behind other 8-core SoCs in terms of multi-threaded performance. This doubling up of the cores should showcase immense MT performance boosts.
On the die shot, we’re seeing that Apple is seemingly mirroring two 4-core blocks, with the L2 caches also being mirrored. Although Apple quotes 24MB of L2 here, I think it’s rather a 2x12MB setup, with an AMD core-complex-like setup being used. This would mean that the coherency of the two performance clusters is going over the fabric and SLC instead. Naturally, this is speculation for now, but it’s what makes most sense given the presented layout.
In terms of CPU performance metrics, Apple made some comparisons to the competition – in particular the SKUs being compared here were Intel’s Core i7-1185G7, and the Core i7-11800H, 4-core and 8-core variants of Intel’s latest Tiger Lake 10nm 'SuperFin' CPUs.
Apple here claims, that in multi-threaded performance, the new chips both vastly outperform anything Intel has to offer, at vastly lower power consumption. The presented performance/power curves showcase that at equal power usage of 30W, the new M1 Pro and Max are 1.7x faster in CPU throughput than the 11800H, whose power curve is extremely steep. Whereas at an equal performance levels – in this case using the 11800H's peak performance – Apple says that the new M1 Pro/Max achieves the same performance with 70% lower power consumption. Both figures are just massive discrepancies and leap ahead of what Intel is currently achieving.
Alongside the powerful CPU complexes, Apple is also supersizing their custom GPU architecture. The M1 Pro now features a 16-core GPU, with an advertised compute throughput performance of 5.2 TFLOPs. What’s interesting here, is that this new much larger GPU would be supported by the much wider memory bus, as well as the presumably 32MB of SLC – this latter essentially acting similarly to what AMD is now achieving with their GPU Infinity Cache.
Apple’s GPU performance is claimed to vastly outclass any previous generation competitor integrated graphics performance, so the company opted to make direct comparisons to medium-end discrete laptop graphics. In this case, pitting the M1 Pro against a GeForce RTX 3050 Ti 4GB, with the Apple chip achieving similar performance at 70% less power. The power levels here are showcased as being at around 30W – it’s not clear if this is total SoC or system power or Apple just comparing the GPU block itself.
Alongside the GPU and CPUs, Apple also noted their much-improved media engine, which can now handle hardware accelerated decoding and encoding of ProRes and ProRes RAW, something that’s going to be extremely interesting to content creators and professional videographers. Apple Macs have generally held a good reputation for video editing, but hardware accelerated engines for RAW formats would be a killer feature that would be an immediate selling point for this audience, and something I’m sure we’ll hear many people talk about.
The M1 Max: A 32-Core GPU Monstrosity at 57bn Transistors & 432mm²
Alongside the M1 Pro, Apple also announced a bigger brother – the M1 Max. While the M1 Pro catches up and outpaces the laptop competition in terms of performance, the M1 Max is aiming at delivering something never-before seen: supercharging the GPU to a total of 32 cores. Essentially it’s no longer an SoC with an integrated GPU, rather it’s a GPU with an SoC around it.
The packaging for the M1 Max changes slightly in that it’s bigger – the most obvious change is the increase of DRAM chips from 2 to 4, which also corresponds to the increase in memory interface width from 256-bit to 512-bit. Apple is advertising a massive 400GB/s of bandwidth, which if it’s LPDDR5-6400, would possibly be more exact at 409.6GB/s. This kind of bandwidth is unheard of in an SoC, but quite the norm in very high-end GPUs.
On the die shot of the M1 Max, things look quite peculiar – first of all, the whole top part of the chip above the GPU essentially looks identical to the M1 Pro, pointing out that Apple is reusing most of the design, and that the Max variant simply grows downwards in the block layout.
The additional two 128-bit LPDDR5 blocks are evident, and again it’s interesting to see here that they’re also increasing the number of SLC blocks along with them. If indeed at 16MB per block, this would represent 64MB of on-chip generic cache for the whole SoC to make use of. Beyond the obvious GPU uses, I do wonder what the CPUs are able to achieve with such gigantic memory bandwidth resources.
The M1 Max is truly immense – Apple disclosed the M1 Pro transistor count to be at 33.7 billion, while the M1 Max bloats that up to 57 billion transistors. AMD advertises 26.8bn transistors for the Navi 21 GPU design at 520mm² on TSMC's 7nm process; Apple here has over double the transistors at a lower die size thanks to their use of TSMC's leading-edge 5nm process. Even compared to NVIDIA's biggest 7nm chip, the 54 billion transistor server-focused GA100, the M1 Max still has the greater transistor count.
In terms of die sizes, Apple presented a slide of the M1, M1 Pro and M1 Max alongside each other, and they do seem to be 1:1 in scale. In which case, the M1 we already know to be 120mm², which would make the M1 Pro 245mm², and the M1 Max about 432mm².
Most of the die size is taken up by the 32-core GPU, which Apple advertises as reaching 10.4TFLOPs. Going back at the die shot, it looks like Apple here has basically mirrored their 16-core GPU layout. The first thing that came to mind here was the idea that these would be 2 GPUs working in unison, but there does appear to be some shared logic between the two halves of the GPU. We might get more clarity on this once we see software behavior of the system.
In terms of performance, Apple is battling it out with the very best available in the market, comparing the performance of the M1 Max to that of a mobile GeForce RTX 3080, at 100W less power (60W vs 160W). Apple also includes a 100W TDP variant of the RTX 3080 for comparison, here, outperforming the NVIDIA discrete GPU, while still using 40% less power.
Today's reveal of the new generation Apple Silicon has been something we’ve been expecting for over a year now, and I think Apple has managed to not only meet those expectations, but also vastly surpass them. Both the M1 Pro and M1 Max look like incredibly differentiated designs, much different than anything we’ve ever seen in the laptop space. If the M1 was any indication of Apple’s success in their silicon endeavors, then the two new chips should also have no issues in laying incredible foundations for Apple’s Mac products, going far beyond what we’ve seen from any competitor.
Post Your CommentPlease log in or sign up to comment.
View All Comments
web2dot0 - Wednesday, October 20, 2021 - linkThe reality is the PC industry hasn't innovated for over a decade. All they've done is add more fans, coolers and more optimizations, when we should be following the Moorse Law.
Apple come along and redefines the industry with their Apple Silicon which clearly are YEARS ahead of their competition. No credible person would think that Apple's isn't gonna keep 2x their product for the next few years. Apple is already designing M4 for all we know. They are just flexing their muscles in small chunks at a time.
Yet, PC folks continue to ridiculous Apple as "piece of junk". It's embarrassing to call themselves computer enthusiasts. Tech is tech. It's not a religion.
Apple has their shortcomings (getting rid of ports, excessive thinness to their laptops at the expense of performance, butterfly keyboard, etc ...), but no PC fanboy wants to admit that Apple does produce quality products compared to their competition.
Apple fanboys wants "acknowledgment", while PC fanboys go to great lengths to deny them and continue to ridicule them. No apple fanboy are gonna just take that lying now. It's a vicious cycle.
If PC fanboys just admit that Apple makes quality products, I'm 100% certain Apple fanboys will also admit that choices are GOOD.
Some people like a supped up Honda Civic, while others like their BMW maintained by the factory warranty. Each their own. It doesn't mean all BMW are crash and Civic a infinitely better and cheaper.
GeoffreyA - Wednesday, October 20, 2021 - linkI agree that if people could just admit a competitor is good, when good, all would be well. It's hard, I know, but has a medicinal effect on the mind, almost if a burden were lifted off one's chest. Such is truth.
I don't agree that the PC space hasn't innovated. How about Sandy Bridge and Zen? Even Bulldozer, despite being a disaster. If Zen's turning the tables on Intel and raising IPC ~15% each year isn't astounding, I give it up. And as far as I remember, Renoir wasn't that far behind the M1---and that's with the handicap of x86's decoding overhead, among other things (5 vs. 7 nm). I'm confident that if AMD built an ARM CPU, after a couple iterations, if not on the first, they'll match or surpass Apple. And I even doubt whether ARM's all it's cut out to be. If x86 must go down, let's hope the industry chooses RISC-V.
While excellent and worthy of applause, the M1 is hardly years ahead of the competition. Where does it stand against Zen 3? Is it really that big of a difference as the story's being painted? Once more, the search for truth. The ultimate test, to see who's the best in design, would be to let Apple craft an x86 CPU or AMD an ARM one.
Farfolomew - Wednesday, October 20, 2021 - linkI think it is in terms of packaging and efficiency. Outright performance maybe not, but the fact that it makes *no compromises* in it's beating of anything the PC space can offer is the major news here. There are no negatives about this chip. It's better in just about everything, and in major ways such as efficiency and parallelism.
If anything, this should be lauded by the PC community. This SHOULD give the kick in the proverbial butt to the likes of Intel/AMD/Quallcomm/NVidia to change their thinking in CPU design, to get back on track with Moore's law. I'm excited to see how the PC industry reacts to this.
Will it gain back performance lead at some point, or will it forever be stuck losing to Apple a'la Android/iOS SoC designs?
GeoffreyA - Thursday, October 21, 2021 - linkWhen the M1 first came out, I felt it would recalibrate the frequency/width/IPC axes, and still do. AMD and Intel only had themselves to compare against all this time. Though Apple's not a direct competitor at present, I'm confident AMD could beat them if they had to, now that they see what sort of performance they've got to aim for. Those who are making fun of x86 underestimate what AMD's capable of. Intel learnt the hard way.
Farfolomew - Saturday, October 23, 2021 - linkHmm, you really think so? I mean, AMD's Ryzen is good, but it's not really any better than Intel's best (Tiger Lake) and will soon be eclipsed by Alder Lake. Ryzen has just caught up to what Intel's been able to offer, but I don't see it as much better. At the very least, compared to these new M1 chips, AMD and Intel chips are nearly identical.
I suppose I just don't see AMD as the one challenging Apple's CPU prowess. They don't have the R&D budget to do so. And Intel? I'm not sure they can ever recover, they're not hiring enough young engineers to rethink the paradigm shifts needed to compete with the coming of ARM.
That leaves Qualcomm and their Nuvia Acquisition, which no one really knows how seriously to take. If Nuvia's design roadmap have them developing M1-like CPUs, then I think Quallcomm's future is bright.
Or perhaps it's not so black and white. X86 might survive just fine, and we'll continue to see a healthy battle and innovation. Afterall, that's the best case for us consumers.
GeoffreyA - Sunday, October 24, 2021 - linkI think it takes more than big R&D budget to make a winning CPU: it was Bulldozer-era AMD that designed Zen. And we've seen that dollars thrown left and right, in Intel fashion, may but doesn't necessarily produce excellence.
Whether x86 will go down, no one can tell right now. As it stands, there is no true competitor on the desktop, Apple being isolated in its own enchanted realm. Qualcomm, who knows? There's a possibility Intel or AMD could announce an ARM CPU (RISC-V being less likely because of no Windows version yet), causing x86 to fade away. I won't be surprised to see Intel trying some trick like this. "If we can't fight Ryzen, why not pull out the carpet from under it?"
As for paradigm shifts, while innovation and flexible thinking are excellent, drastic change has often been disastrous: Pentium 4 and Bulldozer. It's the tried-and-tested ways that work, and further perfecting those. As for ARM, apart from the fixed-length instructions, I don't think there's anything really special about it, as is often painted in the ARM-x86 narrative.
Speedfriend - Thursday, October 21, 2021 - linkTo think Apple will 2x their product is insane. This is not the M1 in reality as it has been in development for years in the iPhone at the main core level. All the easy gains have been made already. I would not be surprised to see a 15 % per generation improvement from here.
WuMing2 - Tuesday, October 19, 2021 - linkMemory bandwidth is half of Fujitsu A64FX employed in the most powerful supercomputer in the world. In a laptop. Incredible.
KPOM - Thursday, October 21, 2021 - linkNice.
Kevin G - Wednesday, October 20, 2021 - linkThese are impressive chips with the M1 Pro hitting the midrange sweet spot in the midrange. I'd love to see Mac Minis and iMacs using these chips soon where they can ride the frequency voltage curve a notch or two higher to really see what these designs are capable of.
The layout of the extra encoders on the M1 Max seem to be targeted at an odd niche vs. what else Apple could have used that die space for. I will argue for the first set of encoder found in the baseline M1 Pro just the extra units is serving an ultra small niche who will actively utilize them.
That dies space would have been better leveraged for two additional things: an on-die FPGA or even more memory channels. The FPGA programmability would permit *some* additional acceleration for codecs but obviously not hit the same performance/die space or performance/watt as a the dedicated units but it would help for those that need more than the first set of encoders. The other idea of additional memory controller is less about increasing memory bandwidth but increasing raw memory capacity: 64 GB isn't a lot when venturing into workloads like 8K video editing. Boosting capacity up to 48/96 GB would see more usage than the secondary encoders and have a better fit across more workloads. The down side of adding additional memory controllers would be greater die size (~500 mm^2?) which leads to high cost of the SoC itself. Total system cost would also increase due to the additional memory chips too. Even with these tradeoffs, I think it'd have been the better choice than a second set of hardware encoders.