Another year, another TechDay from Arm. Over the last several years Arm’s event has come as clockwork in the May timeframe and has every time unveiled the newest flagship CPU and GPU IPs. This year is no exception as the event is back on the American side of the Atlantic in Austin Texas where Arm has one of its major design centres.

Two years ago during the unveiling of the Cortex A73 I had talked a bit more about Arm’s CPU design teams and how they’re spread across locations and product lines. The main design centres for Cortex-A series of CPUs are found in Austin, Texas; Cambridge, the United Kingdom, and Sophia-Antipolis in the south of France near Nice. For the last two years the Cortex A73 and Cortex A75 were designs that mainly came out of the Sophia team while the Cortex A53 and more recently the A55 were designs coming out of Cambridge. This means that we haven’t seen any recent designs coming out of Austin and the last of the “Austin family” of CPUs were the A57 and A72.

The project being worked on in Austin had been hyped up for several years – I remember even as early as the A73 release back in 2016 the company had pulled forward some elements from an advanced future microarchitecture on the back-end pipelines, especially on the FP/SIMD side. The Cortex A75 was further remarked as pulling more elements from this new mysterious project.

Today we can finally unveil what the Austin team has been working on – and it’s a big one. The new Cortex A76 is a brand new microarchitecture which has been built from scratch and lays the foundation for at least two more generations for what I’ll call “the second generation of Austin family” of CPUs.

The Cortex A76 is important for Arm for a design perspective as it represents a new start from a clean sheet. It’s rare for IP claim to be able to do this as it represents a great resource and time investment and if it weren’t for the Sophia design team taking over the steering wheel for the last two generations of products it wouldn’t have been reasonable to execute. The execution of the CPU design teams should be emphasised in particular as Arm claims this is the 5th generation “annual beat” product where the company delivers a new microarchitecture every new year. Think of it as an analogue to Intel’s past Tick-Tock strategy, but rather Tock-Tock-Tock for Arm with steady CAGR (compound annual growth rate) of 20-25% every generation coming from µarch improvements.

So what is the Cortex A76? In Arm’s words, it’s a “laptop-class” performance processor with mobile efficiency. The vision of the A76 as a laptop-class processor had been emphasised throughout the TechDay presentation so it seems Arm is really taking advantage of the large performance boost of the IP to cater to new market segments such as the emerging “Always connected PCs” which Qualcomm is spearheading with their SoC platforms.

The Cortex A76 microarchitecture has been designed with high performance while maintaining power efficiency in mind. Starting from a clean sheet allowed the designers to remove bottlenecks throughout the design and to break previous microarchitectural limitations. The focus here was again maximum performance while remaining within energy efficiency that is fit for smartphones.

In broad metrics, what we’re promised in actual products using the A76 is the follows: a 35% performance increase alongside 40% improved power efficiency. We’ll also see a 4x improvements in machine learning workloads thanks to new optimisations in the ASIMD pipelines and how dot products are handled. These figures are baselined on A75 configurations running at 2.8GHz on 10nm processes while the A76 is projected by Arm to come in at 3GHz on 7nm TSMC based products.

The new CPU is naturally still compatible with DynamIQ’s common cluster topology and Arm envisions designs to be paired with Cortex A55s as the little more power efficient CPUs. The configuration scalability of the DynamIQ IP again was reiterated and we were presented with example configurations such as 1+7 or 2+6 with either Cortex A75 or A76 CPU IP. This presentation slide was one of the rare ones where Arm referred to the area size of the A76, pointing out that the A75 still had better PPA and thus might still be a valid design choice for companies, depending on their needs. One comparison that was made during the event is that in terms of area, three A76’s with larger caches would fit inside the size of a Skylake core – all while within 10% of the IPC of the Intel CPU, but obviously there’s also process node scaling considerations to take into account.

A standout claim is that Arm aims to outperform the competition at half the area and half the power. Arm was slightly beating around the bush here in what it considers the competition, but generally the answer was that it was considering everybody the competition. Taking into account Intel, AMD or Samsung it’s actually not that hard to imagine Arm beating them in PPA as historically the company always had the smallest CPU designs and that directly translates into more efficient microarchitectures.

Before we get into more detailed breakdowns of the performance and power improvements and what I’m expecting to happen into products, let’s see the microarchitectural improvements on the core and how Arm managed to extract this much performance while maintaining power efficiency.

Cortex A76 µarch - Frontend
Comments Locked

123 Comments

View All Comments

  • id4andrei - Thursday, May 31, 2018 - link

    Apple has the manpower and funds to spend extensively for a huge chip for they are free to do things for their own glory. QCOMM designs chip for others to use and must design for price points. They must do so efficiently and maximize yields. ARM provides base designs that others can outright use or customize, you can't really blame ARM here. NVIDIA has no modem.
  • syxbit - Thursday, May 31, 2018 - link

    I get that, but you're missing the point. Sure, the budget phones have a strict budget.
    An $800 Android flagship should not be tight on the SoC budget. If QCOMM sold an ultra high snapdragon that could compete with the A12 you had better believe the Galaxy S10 and Pixel 3 phones would pay to use it.
  • truckasaurus - Thursday, May 31, 2018 - link

    Apple has the convenience of designing to a very specific application. Qualcomm ultimately has to create something that can go into many platforms that are defined no more completely than 'high-end mobile'. That's like asking why the 3.5L V6 that's in most of Toyota's vehicles only makes 268 hp, but in the Lotus Evora 400, which uses the same engine, makes 400 hp. It's because it has been tweaked for a very specific application.
  • serendip - Thursday, May 31, 2018 - link

    Then they should sell supercharger kits for Qualcomm chips ;-)

    That is how the Evora V6 gets 400 hp compared to 280-300 hp on the latest non-turbo versions of that V6. The twin turbo version of that engine on the Lexus LS makes over 450 hp too.
  • truckasaurus - Thursday, May 31, 2018 - link

    That's essentially what I'm getting at. Qualcomm makes the generic version of the engine that can go into a sedan, an SUV, a coupe, and a convertible and adequately power all of them. Apple says, we're only going to make 1 model of sports car and one large luxury sedan, and because we know exactly what our constraints are on these two platforms, we can add a turbo or a supercharger, we can tweak the timing, we can put a high-flow exhaust on it, etc.
  • Pneumothorax - Friday, June 1, 2018 - link

    Your point would make sense if the 845 were being used in low end Androids. Since it's pretty much only being used in high end designs, all out performance should've been the goal.
  • syxbit - Thursday, May 31, 2018 - link

    None of what you're saying makes sense. I simply think QCOMM and the rest are behind Apple because they can't do as good a job as Apple. It isn't because the market doesn't exist or because they need to build flexible designs.
  • SirPerro - Friday, June 1, 2018 - link

    Oh but many of us reading this conversation think all that really makes sense, and it really is because the market doesn't exist or because they need to build flexible designs.

    The car engine analogy was pretty great. It's exactly like that.
  • Threska - Thursday, May 31, 2018 - link

    True. What Apple is good at is showing the potential for what's possible. Other's may have their reasons for not reaching it, but at least none can say it's not possible.
  • shadowx360 - Thursday, May 31, 2018 - link

    A lot of it comes down to power consumption. Samsung managed to get close to the A10 performance but at the cost of much higher power draw. With a 4 wide decoder instead of 6 wide, ARM is able to keep power usage in check and if their claims are to be believed, A10 performance at half the power is probably more desirable to the average consumer than A11/A12 performance at Snapdragon 810 levels of thermal throttle.

Log in

Don't have an account? Sign up now