On Friday afternoon, Intel published a letter by Jeff McVeigh, the company’s interim GM of their Accelerated Computing Systems and Graphics group (AXG). In it, McVeigh offered a brief update on the state of Intel’s server GPU product lineups and customer adoption. But, more importantly, his letter offered an update to Intel’s server GPU roadmap – and it’s a bit of a bombshell. In short, Intel is canceling multiple server GPU products that were planned to be released over the next year and a half – including their HPC-class Rialto Bridge GPU – and going all-in on Falcon Shores, whose trajectory has also been altered, delaying it until 2025. There’s a lot to digest here, so let’s dive right in.

Intel’s most recent public server GPU product roadmap, released back during SC22 in May of 2022, outlined that Intel intended to release server GPU products at a fairly rapid clip. This was (in part) to make up for lost time from the multiple delays involving Ponte Vecchio, Intel’s first HPC-class GPU that is now sold under the Data Center GPU Max family. At that event, Intel announced Ponte Vecchio’s successor, Rialto Bridge, which was to be an evolution of Ponte and was set to sample in mid-2023. Following up on Rialto would be Falcon Shores in 2024, which along with employing an even newer version of Intel’s Xe GPU architecture, would be Intel’s first combined CPU+GPU product (XPU) and was slated to be the ultimate evolution of both Intel’s HPC CPU and GPU lines.

But, citing “a goal of maximizing return on investment for customers”, Intel has significantly refactored their server GPU roadmap, cancelling their previously planned 2023/2024 products. Rialto Bridge, announced less than a year ago, has been scrapped. Instead, Intel will be moving straight on to Falcon Shores for their HPC GPU needs.

These cancellations also impact Intel’s Flex line of server GPUs for cloud gaming and media encoding, with Intel axing Lancaster Sound (aka “Next Sound”) in favor of moving on to its successor, Melville Sound. Melville Sound’s development will be accelerated, in turn, though Intel hasn’t attached a hard date to it. Previously, it was expected at around the same time as Falcon Shores.

Intel did not provide any new visuals for the new roadmap, but I’ve gone ahead and kitbashed something together based on their SC22 slide in order to show where things stand, and what products have been cancelled

According to Intel, these product changes are designed to allow the company to align itself to a two-year product release cadence for server GPUs. Rivals NVIDIA and AMD have been operating at a similar cadence for the past several years, so this would bring Intel’s product cycles roughly in line with the competition. Which, as Intel puts it “matches customer expectations on new product introductions and allows time to develop their ecosystems.”

Reading between the lines, the implication is that Intel wasn’t confident on their ability to sell Rialto and Lancaster to their core customer base. Whether that’s solely because of the rapid product release cycles, or if there was something more to it, is a detail that will remain inside the halls of Intel for now. But regardless, the company is essentially scrubbing their plans to release new server GPU products this year in order to focus on their successors later on.

And in case there’s any doubt about whether this is a good or bad development for Intel, I’ll note that this information was released after 5pm Eastern on a Friday. So Intel is very clearly trying to bury bad news here by releasing it at the end of the week.

Falcon Shores Goes from an XPU in 2024 to a GPU in 2025

Besides cancelling Rialto Bridge and Lancaster Sound, the other major update to come from Intel’s letter is the revelation that Intel has significantly refactored their plans for Falcon Shore. What was (and may yet still be) Intel’s first combined CPU+GPU product for HPC has now been retasked to serve as Intel’s next-generation HPC GPU. Which in turn has significant repercussions for Intel’s product lineup, and their competitive positioning.

Falcon Shores was another 2022 Intel announcement, which was revealed back at the company’s 2022 Investor Meeting in February. At a high level, Falcon Shores was designed to be Intel’s first XPU product – a chip using a variety of compute architectures in order to best meet the execution needs of a single workload. This would be accomplished by using separate processor tiles, and while in practice this primarily meant putting Xe GPU tiles and x86 CPU tiles together on a single chip, the tiled approach left the door open to adding other types of accelerators as well. Intel is already doing tiled server chips with products like Ponte Vecchio and the recently launched Sapphire Rapids XCC, so Falcon Shores was slated to be the next step by placing more than just one type of accelerator on a chip.

At the time, Falcon Shores was slated for release in 2024, to be built on an “angstrom era process”. As planned, Falcon Shores would have been Intel’s power play for the HPC industry, allowing them to deliver a combined GPU+CPU product based on their latest architectures and built on their cutting-edge 20A/18A process node, givng them an edge in the HPC market and putting them ahead of traditional GPU-based accelerators. And while Falcon Shores may yet live up to those goals, it won’t be doing so in 2024. Or even 2025, for that matter.

Instead, Falcon Shores has been retasked to serve as an HPC GPU part. Intel’s brief letter doesn’t go into the technical changes, but the implications are that rather than mixing CPU and GPU tiles on a single chip, Intel is going to devote itself to building a product out of just GPU tiles, reducing what was to be intel’s XPU into a more straightforward GPU.

With the cancellation of Rialto Bridge, it goes without saying that Intel needs a new HPC-class GPU product if they want to remain in the market. And while there’s currently no reason to think that Falcon Shores won’t be a good product, the fact that Intel has delayed it until 2025 is not a promising sign. Ponte Vecchio is already an older design than its age lets on – it was initially intended to launch in 2021 and compete with the HPC GPUs of that era – so by not having anything newer to offer until 2025, Intel is essentially ceding the HPC GPU market for the next two years, outside of their supercomputer wins. Intel’s long-term plans still call for the company to take a sizable chunk of the highly profitable server GPU market – to swipe a piece of what’s largely been NVIDIA’s pie – so these developments essentially put those plans on hold for another two years.

And what may prove worse, this puts Intel even farther behind its competitors in shipping an XPU/APU-like product. As confirmed by Serve The Home, while Intel hasn’t given up on shipping a Falcon Shores XPU, the prioritization of GPUs means that such a product wouldn’t launch in 2025. That means an Intel server-class XPU is now a 2026 product at best – and longer still at worst.

Meanwhile AMD will be shipping a similar server APU later this year with the Instinct MI300. That part will employ multiple chiplets/tiles on a single chip in order to offer HPC-class CPU and GPU performance on a single chip, accomplishing a long-awaited goal for AMD. And while NVIDIA is a bit farther behind in integration with their Grace Hopper superchip – it’s essentially two tightly-coupled chips on a single board, rather than being chiplets on a single chip – it’s still ahead of where Intel is today. And, worryingly for Intel, the next-gen successor to that part is almost certain to launch before any potential Falcon Shores XPU hits the market.

In other words, by pushing back their server GPU schedule, all of Intel’s products that relied on it are now similarly delayed. That leaves Intel’s server CPUs to hold the line on their own for the next few years.

A History of Server GPU Cancellations as Intel Keeps Trying

Ultimately, the cancellation of Rialto Bridge makes it the latest in what has become a surprisingly long line of cancelled GPUs and GPU-like accelerators over at Intel. The company has, in one form or another, been attempting to break into the accelerator market since the late ‘00s, with projects such as Larrabee.  And now, 15 years later, Intel still isn’t as far along as it wants to be, and is still cancelling chips to get there.

As the latest in Intel server GPU casualties, Rialto Bridge is joining not only Larrabee, but Intel’s ill-fated Xeon Phi lineup as well. And even then, Rialto Bridge can’t claim to be the first Xe architecture part to get canned – that honor goes to Xe-HP, which was cancelled in favor of using Intel’s mixed compute and graphics Xe-HPG silicon.

The silver lining to this situation, at least, is that Intel isn’t cancelling a whole architecture here. The Xe architecture has firmly established itself as Intel’s in-house GPU architecture, and the fact that it’s shipping in everything from SoCs to video cards to HPC accelerators underscores its viability and importance at Intel. The cancellation of Rialto Bridge coupled with the delay of Falcon Shores is still a significant setback for Intel, but at the end of the day it’s just shelving one iteration of the broader Xe architecture for another.

In the meantime, Intel will keep working to expand their GPU offerings, and to break into the HPC GPU market in a bigger way. It’s a market that is simply too big and too profitable for Intel to ignore – least it eats even more into their server CPU sales – so they will have to keep at it until they find the success they seek.

Source: Intel

Comments Locked


View All Comments

  • mode_13h - Sunday, March 5, 2023 - link

    > Current Ponti Vecchio customers are unlikely to replace these systems with Rialto next year

    I'm sure that was never the expectation. The point of Rialto Bridge would be to try and keep pace with AMD and Nvidia, who are launching products newer than what Ponte Vecchio was designed to compete with.
  • KurtL - Monday, March 6, 2023 - link

    I agree with that. HPC centers are not gamers, equipment is used for 5-6 years at the moment as the evolution of semiconductor technology has slowed down enough that chips remain sufficiently performance-per-Watt competitive that long. My impression was always that Rialto Bridge was an effort to make Ponte Vecchio more manufacturable and hence a bit more affordable.

    What the market probably needs most at this time though is a low-cost chip from both AMD and Intel that still supports their respective software stacks well and can be used as a development platform, the way CUDA became popular because every school kid could learn to use it on their PC.
  • mode_13h - Monday, March 6, 2023 - link

    I wonder to what extent Arctic Sound (or its consumer-grade Arc cousin) can fill in as a suitable proxy for oneAPI developers.
  • JayNor - Saturday, March 4, 2023 - link

    I recall discussions about HPC users moving away from heterogeneous nodes and towards disaggregated racks of GPUs.

    The IPU/DPU chips and the DSA on the SPR CPUs and ROCE DMA controllers on Intel's Habana chips all can function to push data to the GPUs. Intel's Ponte Vecchio has prefetch streaming instructions for L2.

    I'm not saying that GPUs don't need something to feed them, just that it doesn't have to be the job of a server class CPU.
  • Yojimbo - Sunday, March 5, 2023 - link

    I think it depends on the workload. With some workloads it is necessary for the GPUs to act on large amounts of data. Then it must be copied in and out of main memory, which is slow and consumes a lot of energy. Also some workloads benefit from CPUs and GPUs acting on the data, i.e., in a truly heterogeneous way. Then having the processors both close to the memory again is faster and saves energy.
    It's in hyperscale settings where disaggregation is likely preferred. In such settings you have a large amount of compute resources but they are being shared by many relatively lightweight tasks. Finally, with supercomputers, current DPUs would likely be swamped and also a lot of the code run on them need powerful CPUs,
  • Kevin G - Sunday, March 5, 2023 - link

    This is where a good chiplet strategy comes into play. Need fast memory? Add a HBM memory controller and memory tiles. Need capacity? Add DDR5 memory controller tiles. As long as the die-to-die interconnects are good, this provides incredible flexibility in end products.
  • mode_13h - Sunday, March 5, 2023 - link

    Use CXL for external memory, not DDR5. CXL can be used either for memory capacity or interconnect bandwidth, plus it's cache-coherent (unlike PCIe).
  • The Von Matrices - Sunday, March 5, 2023 - link

    CXL memory nodes are just DDR memory controllers with the additional latency of being a remote node. Local memory controllers will always be faster.
  • mode_13h - Monday, March 6, 2023 - link

    Right. You don't use CXL (or DDR5) for bandwidth, on these things. HBM much better serves that purpose. To the tune of 3.3 TB/s, in all of PVC, MI250X, and H100!

    That's a completely impractical target to achieve using DDR5, so external DRAM is simply relegated to the role of a slower memory tier for capacity scaling. To that end, you might as well just use CXL, because then you can reap its other benefits.
  • Kevin G - Monday, March 6, 2023 - link

    The reason to favor native DDR5 over CXL memory expanders is the raw capacity you can get from going massive wide with 8-12 channels in addition to HBM. HBM simply doesn't have the raw native capacity for many workloads and thus nodes will have to be supplemented. Any work loads that need even more would be well suited for CXL with a 3-tier memory setup (HBM > DDR5 > CXL).

Log in

Don't have an account? Sign up now