"Set to launch in 2022, Sapphire Rapids will be Intel’s first CPU product to take advantage of a multi-die architecture". Doesn't that comment forget the Pentium D, Kentsfield, etc?
Thanks, the edit works quite well. It is just a pet peeve of mine. You could have also been more blunt and I would have been fine (Intel's first tiled approach was clumsy at best).
If you want to go waaaaaay back, there is the Pentium Pro with its separate L2 cache die on package.
There is also Westmere which had a 32 nm CPU and 45 nm north bridge in the same package. There are also a slew of recent mobile multi-die packages that integrate the chipset on package to reduce board area.
All of the renders and block diagrams make it look like Intel made two mirrored dies of what is otherwise the same chip design. Did Intel say anything about that?
The "glue" is everything, though. Whole article is really about that. Gotta characterize and benchmark it. At least on paper it looks very good, better than AMD's
yannigr2 was referring to just about everyone (even our very own Anand) calling the Pentium D two cores glued together. https://www.anandtech.com/show/1665/2 It is an old joke.
"The new core, according to Intel, will over a +19% IPC gain in single-thread workloads compared to Cypress Cove, which was Intel’s backport of Ice Lake."
I think Cypress Cove is a backport of Willow Cove. Ice Lake uses Sunny Cove. There isn't much IPC improvement from Sunny Cove to Willow Cove, if I recall, so 19% IPC over Ice Lake might still be accurate even if the information given is technically wrong.
The "it" in my sentence referred to Intel. Intel doesn't include clock speed increase in its measure of IPC increase. Intel compares "iso-freq" (same frequency) as per this graphic for Sunny Cove over Skylake https://images.anandtech.com/doci/14514/Blueprint%... ...or this one for Golden Cove over Cypress Cove.
They're comparing the uplift to a core that wasn't used in Xeon processors - and was manufactured on a 14nm process for consumers - so I'm feeling extremely suspicious about how this would look compared to, say, Tiger Lake (also a consumer core, but at least on a comparable process).
My post was meant to point out an error in the Anandtech article, not to discuss Intel's IPC claims. But I am guessing when Intel compared the Golden Cove IPC to Cypress Cove it was in relation to Alder Lake. It's not clear if Intel repeated that comparison with respect to Sapphire Rapids or if Anandtech included it here since it was the only comparison they had. It's not a bad comparison, though, because the underlying core is the same and I don't think they change their basket of operations for their IPC averages for client and server (although maybe that's wrong and they do). The problem is it doesn't relate directly to Ice Lake very well, something that isn't clear the way the article is written because of the mistake I pointed out.
Oh ok, thanks. You are right. I think I read something some time ago with bad information and remembered the bad and forgot the good. Well, that spins Rocket Lake in a slightly different light from the impression I have had of it.
Yeah it is a really weird mashup of architecture and choices. I still think they mostly used it as a test platform for backpacking as Intel seems very determined to have forthcoming architectures node agnostic.
Not sure what I'm more impressed by...www.IntelTech.com giving us another Intel marketing presentation OR Intel finally catching up to AMD's Zen 1 in 2022.
I was under the impression that Intel had a fabric for heterogeneous memory usage between the CPU and GPU on the Aurora supercomputer, but checking after reading this article, I noticed that the Aurora website lists the CPU-GPU link as PCI Express. I guess that makes the selection of an HPE A100 machine as a development platform for Aurora make more sense, but it's a bit surprising. Now I'm curious if anything different was mentioned back when the A21 specs were revealed a couple of years ago.
So... Lots of cores probably, but we don't know how many. No word on costs, but 4 large dies on a process that has yet to yield well + EMIB with high-power chips + double the number of masks required for the product doesn't speak to being economical. CXL 1.1, only not exactly, just bits of it. Lots of AI-friendly stuff, but people are already doing AI stuff on dedicated accelerators. Moar Sockits, though! Whee!
Honestly, I'm a little concerned. This looks like it's going to be wildly expensive and not very compelling. I hope the smaller dies are a little less absurd.
Someone has already disassembled and posted images of a Sapphire Rapids XCC engineering sample: https://www.bilibili.com/read/cv11692201 The SPR XCC SKUs are expected to top out at 56 cores, however, the individual dies clearly feature a 4x4 grid with 15 CPU core tiles plus an IMC tile (for the two DDR5 controllers). The dies are 426 mm², and according to the roadmaps at least one CPU core per die will be disabled.
Intel typically produces three floorplan designs for each Xeon Scalable generation, e.g. XCC, HCC, and LCC. While they are burning two tape-outs with the left and right XCC dies for SPR, there is also supposedly an up to 34 core MCC product in the works. If Intel were to add another column to the XCC die—three more CPU core tiles and another IMC tile—they would end up with an 18-core die with four memory channels. A two die package would have up to 36 cores (or 34 with at least one core disabled on each die) and still have 8-channel memory. An 18-core, 4-channel HEDT or workstation product would also be possible using a single die.
Interesting. That would certainly help to fill out a bunch of products further down the line, but wouldn't it also require double the masks to produce a mirror design of the die?
Still, it (alongside the process improvements) should at least help end the yield problems they've had with Ice Lake SP.
I always get a kick these days to see "Intel's Process 7" stated with a straight face and no qualifier. Trying desperately to achieve some sort of marketing-level process equality with AMD/TSMC (who can unashamedly and accurately say "7nm".) This is what "we're behind" marketing looks like, I guess. Intel seems to have learned a lot from its previous association with Apple--well, at least in terms of marketing...;)
Wanted to add that all of this wordage used to describe vaporware is beginning to remind me of a lot of Larrabee. Although to be fair, I really do think that at some point Intel will have something to sell here...at some point. That'll be nice because then we won't be discussing vaporware any longer.
What if I tell you they're technically close to equivalent to TSMC 7 and Samsung 5? You can specify a process to be as advanced as you want, the question is whether/how well you can build it. Originally what was called Intel 10nm was very ambitious, way better than TSMC's 10nm. Only it would take Intel 7 years in between nodes rather than the expected 2-3, giving TSMC all the time to catch up and more. Today TSMC is in pole position - we're just waiting for their 5 to mature to reach desktop/server power density.
The "nanometers" you're so faithful to are just a small bit of the entire wafer-to-microprocessor saga. Intel, Global Foundries (ex-AMD), TSMC, Samsung and the others have different measurements in all of these areas, and some production choices are better than other. The "Feature Size" (the so-called nanometers) is not the only "measuring stick" - you also have vias (i.e. the width of the electrical connections), the uniformity with which you can produce something (leading to a lower voltage overall, as there are no "outliers" that need that extra voltage to work), ... As for "transistor density" - this varies across processes and process variants, and is also affected by the "library" (high density or low density - one for the most transistors at lower power, the other with less transistors on the same area but it has better frequency and can cool a higher power-per-transistor). Anyway, here's an article with helpful pictures - Intel 10nm is relatively similar to TSMC 7nm in density, it's below just by a bit and not by the 49% straight math would suggest (i.e. 10x10=100 transistors in a 70nmx70nm square for TSMC and 7x7=49 transistors for Intel 10nm). https://hexus.net/tech/news/cpu/145645-intel-14nm-...
Process names have been marketing, not measurements of any actual feature size since they were naming processes by microns. And others have noted that what Intel's calling Intel 7 now is pretty similar density to TSMC N7 for CPUs. I'd also note that TSMC's actual name for what's commonly called their "7nm" is "N7", not "7nm"; no nanometers there either.
page 1, Golden Cove: A High-Performance Core with AMX and AIA, text under the AMX picture: "AMX uses eight 1024-bit registers for basic data operators" should be 1024 BYTE (or 1KByte) not 1024-bit. AMX has 8 (row/column) configurable 1KB so called T registers, i.e. the 8 T registers can be configured to use a maximum size of 1KByte each but can also be smaller configured by row and columns parameters (you set tile configuration for each tile with the STTILECFG assembly instruction: i.e. row, columns, BF16/INT8 data type etc). For more details see AMX section in this document: https://software.intel.com/content/www/us/en/devel...
Cant edit so have to use a comment to clarify: LDTILECFG is used for setting the tile file configuration of all 8 tiles (# of rows and # columns per T register, while Data type is not set by this instruction) while STTILECFG is used for reading out the current tile file configuration and store the read out store that in memory.
It's a type from Intel on the slides that you unfortunately propagated. Should be 1KByte not 1Kb (as in 1 Kbit). yeah this presentation was not one of intel's finest moment...
just read the full spec ere: https://software.intel.com/content/www/us/en/devel... There is significantly more detail in the full documentation. all sorts of limitation on number of rows (max 16) for instance which complicates INT8 matrices just as an example... What I would have liked would be to be able to is to fully configure # of rows and # of columns within the 1KByte for a given data type - to fully use each T register 1KByte size. We now need to have rectangular NxM matrix tiles instead of the preferable square NxN matrix tiles (and fit them into 16xM = 1024 bytes, solve for M)- symmetric N x N tiles makes algorithms easier...
Ian, to be clear the intel AMX specs in the intel doc:https://software.intel.com/content/www/us/en/devel... spends entire chapter 3 (25 pages) discussing AMX in detail. Stating multiple times that each T register is 1KByte and the whole register files size is 8KByte, also detailing each assembly Instruction etc. Additionally, first rev of this document was published last summer and the latest rev was published in June this year. During this whole time the T register 1KByte size have never changed (but more details have been included with each revision the past 12 months). Further, glibc and various compliers have already included AMX extensions based on this spec. it would be quite catastrophic for them if intel suddenly cut the T reg size to 1024 bits. Also, T reg size is not really new news. https://fuse.wikichip.org/news/3600/the-x86-advanc... published a pretty good article already last summer about this (also stating T regs are 1Kbyte). Lastly, it makes no logical sense to only have 1024bit (128Bytes) tile regs because it is just too small. Hence, you can safely assume that intel messed up on the slide and adjust your article accordingly. If you still don't believe it, ask intel yourself.
One of the rumors for Gen 4 Epyc is 12 channels of DDR5. Now this is just a rumor so it HAS to be taken with a grain of salt. However, if Epyc goes 12 channels, Arm goes 12 channels, and SPR is at 8 channels we could see another instance like Gen 1 & 2 Xeon Scalable not having RAM parity. While going DDR5 does increase the bandwidth, I don't think it does enough to justify not increasing the channels at the same time.
Bandwidth might be OK for AI with HBM on SPR. One thing to remember is that most of these are going to be running on hypervisors. 6 channel RAM became immediately an issue with Xeon Scalable (especially with their old 1TB/socket limit without L series CPUs where you could only get 768GB RAM). If they only have 8 channels when everyone else has 12 channels you cannot put as much RAM into a system for cheap. Most servers are dual socket and if you are using a hypervisor RAM capacity matters A LOT. If you can have 1.5TB (dual socket with 64GB DIMMs) instead of 1TB (dual socket with 64GB DIMMs) that makes a huge difference for running VMs. All the hosts in my datacenter run with 1TB RAM & dual 32c/64t CPUs. We are not CPU limited but we are RAM limited on each host. While VMware can do RAM compression/ballooning, once you start over provisioning RAM you will start running into performance issues. I've read that after about 10-15% over provisioning on RAM you start getting pretty major performance loss. I've experienced VMs basically stall out (like what happened in the early 2000s when your computer used 512MB RAM and you only had 384MB RAM) at a 50% over provision. Basically depending on the workload bandwidth isn't everything.
It seems 99% likely that we're looking at 8 cores per tile for a max of 32 cores in the package. Intel has so far proven incapable of making a single piece of functional silicon with more than 8 large cores on it using anything smaller than 14nm.
All the rumors for Sapphire Rapids is pointing to 14 Cores per Tile in a 4x4 Grid. 2x of those nodes are for Vertical/Horizontal Interconnect management.
Not exactly. 4x4 grid with 15 CPU core tiles and one IMC tile for the two DDR5 controllers. See my post further up for additional details and link to actual die shots.
*technically DDR5 puts two 32-bit channels on a single module, but as yet the industry doesn’t have a term to differentiate between a module with one 64-bit memory channel on it vs. a module with two 32-bit memory channels on it. The word ‘channel’ has often been interchangeable with ‘memory slot’ to date, but this will have to change.
JEDEC calls them independent channels, that is 2 channels per module. Hence, an eight DIMM server board has a 16 channel memory system. I don't know why these terms are in flux everywhere.
Because people have been interchanging module and channel for years, and one module = one channel. The fact that DDR5 moves down to 32-bit channels from 64-bit channels means I'm going to be sprinkling the word controller around two be absolutely specific.
The lower core count versions will certainly be interesting. If the comments are correct and it is 14 cores per tile then you'd have 56 cores max. Certainly you could see them doing 52 and 48 core versions from die harvesting with 1 or 2 cores disabled per tile but the further below that you go the less it makes sense. On the other hand looking at the high level chip diagrams you pretty much have IO going around the entire outside of the cluster of tiles. I'm not sure how much smaller you can make the tiles and still have enough room for all the IO. What's the min core count going to look like? Are there going to be a 16 or 20 core version? Are they still going to use tiles for those or design a different monolithic die?
It would be logical to infer that they're going to need at least one more monolithic design, to allow for designs with fewer tiles with the same number of memory channels.
Unless they just leave the lower-core designs with less memory bandwidth, which would be a product segmentation strategy of sorts, I guess?
Finally something innovative from Intel after years of abandonment of HEDT and Xeon leadership. I would give credit to Intel here because no big little BS scam. Mirroring the die mask design and using separate on top of the design which is using an EMIB on such a large silicon damn it looks super complicated vs AMD chiplet strategy.
Now for the worst part, the I/O controller and Memory controller. That is going to be an issue for this design, Intel's mesh had power problems and inferior x86 performance on SKL Xeon and the Ice Lake solved that issue but core problem is AMD solved the EPYC 7000 series Zen based chiplets into a central I/O die and memory controller design eliminating the NUMA node performance hit. With smaller path trace due to EMIB this looks great but still it will have that hopping issues of Zen design based processors.
So a SPR based HEDT LGA4xxx socket is coming but when ? 2022 ? Zen 3 Threadripper Chagall / Genesis Peak is coming this year. And Zen 3 based V-Cache Milan EPYC will be coming next year once the factories start producing them and they will be dripped to AM4 socket processors. SPR needs to prove a lot, Zen 4 is dropping soon with 96C and beastly IPC on top of 12 Channel memory design on TSMC 5N.
IPC is a whole big another equation BUT most important is how the Intel 7 / 10nm design is vs TSMC 7N based EPYC in terms of clock potential and efficiency to performance ratios. Esp the fact that Intel had to cut off the x86 cores into those small SKL inferior crap cores onto the LGA1700 socket to keep up with the rising power consumption of their x86 processor designs. This one maxes out at 56C apparently with each tile at 14C, a big shame all these 14C couldn't make it to the LGA1700 they would have been perfect for the desktop compute, for those stupid thin and light BGA junk sales they axed it and shoved those efficiency designs into the Desktop LGA platform.
The NUMA Domain is going to be interesting with the 4x Memory Controllers split on each die having to cross domain boundaries.
And there appears to be 2x Cross-Tile interconnects on each tile that hold CHA (Caching and Home Agent) and a LLC (Last Level Cache) to handle resolution of coherency across multiple processors.
Each core has a tiny CHA to check against the larger dedicated CHA & LLC to improve scalability according to the number of cores, memory controllers, or the sub-NUMA clustering mode. The address space is interleaved across different CHAs, which act like a single logical agent.
So that's a interesting solution to their data locality issues when multi-threading or having cross core communication.
Why do you presume there will be any NUMA domain boundaries on-package? The whole point of going with EMIB and tiles vs. conventional chiplets on an organic substrate is that the EMIBs essentially act as "long wires" bridging the metal layers of the individual dies and extending the mesh without having to pipe the data over a separate link with a SerDes / PHY at each end.
The CXL.mem feature in Emerald Rapids can be seen in the slides in the adoredtv transcription article, "intel-rapidly-getting-worse", from June 29, 2021.
Itanium, Optane, Knight's Landing, and Intel's original 10nm plans were all cool pieces of tech, too. Yet they lost to less cool tech like AMD64, flash, GPUs, and TSMC 7nm.
After reading all this I'm left wondering if Intel designed this thing to showcase how cool EMIB is rather than EMIB enabling the optimal design (aka, a misalignment between cart and horse).
I look forward to seeing how this super cool glue compares to the combo of AMD"s Elmer's plus a big slab of SRAM plus high yielding multi-use chiplets.
I forgot to ask about the Alderlake AVX support. I quickly looked at the Intel's presentation and it casted AMX acceleration as enterprise feature in no uncertain terms, which the slides show in this article as well (some journalists got confused at this point). However, the AVX512 support was simply thrown in as a feature of all P-cores without a distinction. Where did you get the information on the status of the consumer AVX512 support?
ADL E-cores do not have AVX512 and there's no effective way currently deal with having two different levels of AVX support in the same processor. If you read through the ADL coverage here on AnandTech you'll see it isn't in any of the slides, and I believe it was Ian who confirmed they're fusing it off for ADL.
HBM variants are quite interesting. HBMs can work as an L4 cache, but they have latency penalties due to lower clocks(thermal limit). Quite curious how high latency + high bandwidth will work with various workloads.
Intel commented that HBM wasn't fast enough for all their needs on Ponte Vecchio. They added large L1 and L2 caches and a Rambo Cache that has yet to be fully described.
I'm almost disappointed they're not using gracemont - I suppose that wasn't realistic. In any case, either the tantalizing leaks of gracemonts perf and efficiency are false, or it would be a fantastic server chip - almost 4x the cores, albeit significantly slower ones? Yes please!
They would run straight into the ARM powered microprocessors - see https://aws.amazon.com/ec2/graviton/ Intel doesn't really want to compete at the low end (many cheap cores), its entire marketing message is "we make fast and expensive microprocessors in the server market".
Knights Landing was Atom based platform. Tons of Atom cores on a giant LGA4xxx socket and it died. The only reason they shoved those pathetic cores onto LGA1700 is to compete in SMT applications against Ryzen without blowing a hole in the power consumption. Once we get the TDP of these new Xeon SPR processors we will get an idea on how a 14C tile operates, they avoided going 14C on the desktop also due to scaling them onto the crappy BGA trashware where their efficiency works to combat Apple M series and AMDs APUs. That is the highest margin for Intel apart from Datacenter business of Xeon. Their desktop is lowest priority of all 3. So they do not care about that but they wasted ton of cash on that stupid Intel Thread Director.
They've made some "lots of Atom cores" processors before; it wouldn't be too shocking if they did again. Though swapping out 14X Golden Cove tiles for 56X Gracemont tiles to build a 224-core monster seems ... improbable.
Recent Intel CPUs have been power hogs when the frequency is raised sufficiently to start to compete with the AMD CPUs. (So much so that Cloudfare gave up on Intel CPUs in its recent servers (See https://www.theregister.com/2021/09/01/cloudflare_... for more details).) For any large scale deployment of servers, the power consumption of the servers is as important as their performance. A quote from a Cloudflare engineer "Although Intel's chips were able to compete with AMD in terms of raw performance, the power consumption was several hundred watts higher per server – that's enormous."
"This means that if SPR is going to offer versions with fewer cores, it is going to either create dummy tiles without any cores on them, but still keep the PCIe/DDR5 as required, or quite simply those lower core counts are going to have fewer memory controllers."
Or these mirrored die pairs will have very aggressive stratified binning.
Intel rely on short memories. When Zen appeared, they were notorious for forcing multi socket servers (gouging) on folks who really only needed single sockets - the exact opposite of aiming for better processor interconnects.
They know how vital tdp is as a metric. If their solution is so competitive - why does it push the maximum acceptable tdp limits? It smacks of desperation.
"The word ‘channel’ has often been interchangeable with ‘memory slot’ to date, but this will have to change."
I get what you're saying, but as-worded that hasn't been true like... ever. I'm typing this on a newfangled dual channel system with 4 (count em, 4!) memory slots. I know it would be confusing to say you've got 16 x 32 bit channels for 8 or 16 modules, people might think you're only enabling half the channels with 8 modules. You could always specify double-channel modules or something of that nature.
Set to launch in 2022, Sapphire Rapids will be Intel’s first CPU product to take advantage of a multi-die architecture". Doesn't that comment forget the Pentium D, Kentsfield, etc?..
Re. LLC and retaining 8 memory channels - I surmise that unnecessary UPI components are reconfigurable as... Fpgaish perhaps? Also I'd guess the hbm versions wouldn't need more pins, just a fat package, less pins in the sans-ddr version. Also can we have more focus, a dedicated article even, on the workstation (SP) variant, its been gone too long - can't let intel forget they haven't released a processor that interests actual computing-professionals for years.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
94 Comments
Back to Article
dullard - Tuesday, August 31, 2021 - link
"Set to launch in 2022, Sapphire Rapids will be Intel’s first CPU product to take advantage of a multi-die architecture". Doesn't that comment forget the Pentium D, Kentsfield, etc?Ian Cutress - Tuesday, August 31, 2021 - link
You're meant to take the rest of that sentence as well :) I've updated it to make it clearer.dullard - Tuesday, August 31, 2021 - link
Thanks, the edit works quite well. It is just a pet peeve of mine. You could have also been more blunt and I would have been fine (Intel's first tiled approach was clumsy at best).TrevorH - Tuesday, August 31, 2021 - link
I Think you could have just put a full stop on the end of "Set to launch in 2022, Sapphire Rapids will be Intel’s first modern CPU product"Kevin G - Tuesday, August 31, 2021 - link
If you want to go waaaaaay back, there is the Pentium Pro with its separate L2 cache die on package.There is also Westmere which had a 32 nm CPU and 45 nm north bridge in the same package. There are also a slew of recent mobile multi-die packages that integrate the chipset on package to reduce board area.
ballsystemlord - Tuesday, August 31, 2021 - link
The Intel Pentium D was also multi-die. I know, I took one apart. I have pictures.jordanclock - Tuesday, August 31, 2021 - link
I'm not sure if I would call duct taping together two P4s as architecture as much as it was jerry rigged.GNUminex_l_cowsay - Tuesday, August 31, 2021 - link
All of the renders and block diagrams make it look like Intel made two mirrored dies of what is otherwise the same chip design. Did Intel say anything about that?Ian Cutress - Tuesday, August 31, 2021 - link
Yup, that's in the article.yannigr2 - Tuesday, August 31, 2021 - link
Just 4 CPUs glued together but with 2021 marketing in the presentation.Wrs - Tuesday, August 31, 2021 - link
The "glue" is everything, though. Whole article is really about that. Gotta characterize and benchmark it. At least on paper it looks very good, better than AMD'sdullard - Tuesday, August 31, 2021 - link
yannigr2 was referring to just about everyone (even our very own Anand) calling the Pentium D two cores glued together. https://www.anandtech.com/show/1665/2 It is an old joke.heickelrrx - Friday, September 3, 2021 - link
it's not a gluebut ducktape
Yojimbo - Tuesday, August 31, 2021 - link
"The new core, according to Intel, will over a +19% IPC gain in single-thread workloads compared to Cypress Cove, which was Intel’s backport of Ice Lake."I think Cypress Cove is a backport of Willow Cove. Ice Lake uses Sunny Cove. There isn't much IPC improvement from Sunny Cove to Willow Cove, if I recall, so 19% IPC over Ice Lake might still be accurate even if the information given is technically wrong.
yeeeeman - Tuesday, August 31, 2021 - link
Actually there isn't much IPC improvement even from sunny cove to willow cove.Yojimbo - Tuesday, August 31, 2021 - link
You repeated something I already had written.shabby - Tuesday, August 31, 2021 - link
Intels idea of ipc improvement come from the higher mhz the chip runs at, they're skewing what ipc means.Yojimbo - Tuesday, August 31, 2021 - link
No it doesn't.Foeketijn - Wednesday, September 1, 2021 - link
yes it does. IPC is performance per hz. Double the speed, double the performance means same IPC.Yojimbo - Wednesday, September 1, 2021 - link
The "it" in my sentence referred to Intel. Intel doesn't include clock speed increase in its measure of IPC increase. Intel compares "iso-freq" (same frequency) as per this graphic for Sunny Cove over Skylake https://images.anandtech.com/doci/14514/Blueprint%......or this one for Golden Cove over Cypress Cove.
Yojimbo - Wednesday, September 1, 2021 - link
Oh here's the second link: https://images.anandtech.com/doci/16881/46.jpgSpunjji - Tuesday, August 31, 2021 - link
They're comparing the uplift to a core that wasn't used in Xeon processors - and was manufactured on a 14nm process for consumers - so I'm feeling extremely suspicious about how this would look compared to, say, Tiger Lake (also a consumer core, but at least on a comparable process).Yojimbo - Tuesday, August 31, 2021 - link
My post was meant to point out an error in the Anandtech article, not to discuss Intel's IPC claims. But I am guessing when Intel compared the Golden Cove IPC to Cypress Cove it was in relation to Alder Lake. It's not clear if Intel repeated that comparison with respect to Sapphire Rapids or if Anandtech included it here since it was the only comparison they had. It's not a bad comparison, though, because the underlying core is the same and I don't think they change their basket of operations for their IPC averages for client and server (although maybe that's wrong and they do). The problem is it doesn't relate directly to Ice Lake very well, something that isn't clear the way the article is written because of the mistake I pointed out.thestryker - Tuesday, August 31, 2021 - link
Cypress Cove is a backport of Sunny Cove not Willow Cove, but it does have Xe graphics like TGL.Yojimbo - Tuesday, August 31, 2021 - link
Oh ok, thanks. You are right. I think I read something some time ago with bad information and remembered the bad and forgot the good. Well, that spins Rocket Lake in a slightly different light from the impression I have had of it.thestryker - Tuesday, August 31, 2021 - link
Yeah it is a really weird mashup of architecture and choices. I still think they mostly used it as a test platform for backpacking as Intel seems very determined to have forthcoming architectures node agnostic.DannyH246 - Tuesday, August 31, 2021 - link
Not sure what I'm more impressed by...www.IntelTech.com giving us another Intel marketing presentation OR Intel finally catching up to AMD's Zen 1 in 2022.JfromImaginstuff - Wednesday, September 1, 2021 - link
I know you meant that as a jab, but that's an actual websiteDannyH246 - Wednesday, September 1, 2021 - link
hahahahaha i never knew!!!Yojimbo - Tuesday, August 31, 2021 - link
I was under the impression that Intel had a fabric for heterogeneous memory usage between the CPU and GPU on the Aurora supercomputer, but checking after reading this article, I noticed that the Aurora website lists the CPU-GPU link as PCI Express. I guess that makes the selection of an HPE A100 machine as a development platform for Aurora make more sense, but it's a bit surprising. Now I'm curious if anything different was mentioned back when the A21 specs were revealed a couple of years ago.Spunjji - Tuesday, August 31, 2021 - link
So...Lots of cores probably, but we don't know how many.
No word on costs, but 4 large dies on a process that has yet to yield well + EMIB with high-power chips + double the number of masks required for the product doesn't speak to being economical.
CXL 1.1, only not exactly, just bits of it.
Lots of AI-friendly stuff, but people are already doing AI stuff on dedicated accelerators.
Moar Sockits, though! Whee!
Honestly, I'm a little concerned. This looks like it's going to be wildly expensive and not very compelling. I hope the smaller dies are a little less absurd.
Kamen Rider Blade - Tuesday, August 31, 2021 - link
According to leaks, 14 Cores per Tile is the Max / (100%) yield.2 of the spots that would be used for a 4x4 2D-array of cores is used for Inter-connect management.
repoman27 - Tuesday, August 31, 2021 - link
Someone has already disassembled and posted images of a Sapphire Rapids XCC engineering sample: https://www.bilibili.com/read/cv11692201 The SPR XCC SKUs are expected to top out at 56 cores, however, the individual dies clearly feature a 4x4 grid with 15 CPU core tiles plus an IMC tile (for the two DDR5 controllers). The dies are 426 mm², and according to the roadmaps at least one CPU core per die will be disabled.Intel typically produces three floorplan designs for each Xeon Scalable generation, e.g. XCC, HCC, and LCC. While they are burning two tape-outs with the left and right XCC dies for SPR, there is also supposedly an up to 34 core MCC product in the works. If Intel were to add another column to the XCC die—three more CPU core tiles and another IMC tile—they would end up with an 18-core die with four memory channels. A two die package would have up to 36 cores (or 34 with at least one core disabled on each die) and still have 8-channel memory. An 18-core, 4-channel HEDT or workstation product would also be possible using a single die.
Spunjji - Tuesday, September 7, 2021 - link
Interesting. That would certainly help to fill out a bunch of products further down the line, but wouldn't it also require double the masks to produce a mirror design of the die?Still, it (alongside the process improvements) should at least help end the yield problems they've had with Ice Lake SP.
WaltC - Tuesday, August 31, 2021 - link
I always get a kick these days to see "Intel's Process 7" stated with a straight face and no qualifier. Trying desperately to achieve some sort of marketing-level process equality with AMD/TSMC (who can unashamedly and accurately say "7nm".) This is what "we're behind" marketing looks like, I guess. Intel seems to have learned a lot from its previous association with Apple--well, at least in terms of marketing...;)WaltC - Tuesday, August 31, 2021 - link
Wanted to add that all of this wordage used to describe vaporware is beginning to remind me of a lot of Larrabee. Although to be fair, I really do think that at some point Intel will have something to sell here...at some point. That'll be nice because then we won't be discussing vaporware any longer.Wrs - Tuesday, August 31, 2021 - link
What if I tell you they're technically close to equivalent to TSMC 7 and Samsung 5? You can specify a process to be as advanced as you want, the question is whether/how well you can build it. Originally what was called Intel 10nm was very ambitious, way better than TSMC's 10nm. Only it would take Intel 7 years in between nodes rather than the expected 2-3, giving TSMC all the time to catch up and more. Today TSMC is in pole position - we're just waiting for their 5 to mature to reach desktop/server power density.Calin - Wednesday, September 1, 2021 - link
The "nanometers" you're so faithful to are just a small bit of the entire wafer-to-microprocessor saga. Intel, Global Foundries (ex-AMD), TSMC, Samsung and the others have different measurements in all of these areas, and some production choices are better than other.The "Feature Size" (the so-called nanometers) is not the only "measuring stick" - you also have vias (i.e. the width of the electrical connections), the uniformity with which you can produce something (leading to a lower voltage overall, as there are no "outliers" that need that extra voltage to work), ...
As for "transistor density" - this varies across processes and process variants, and is also affected by the "library" (high density or low density - one for the most transistors at lower power, the other with less transistors on the same area but it has better frequency and can cool a higher power-per-transistor).
Anyway, here's an article with helpful pictures - Intel 10nm is relatively similar to TSMC 7nm in density, it's below just by a bit and not by the 49% straight math would suggest (i.e. 10x10=100 transistors in a 70nmx70nm square for TSMC and 7x7=49 transistors for Intel 10nm).
https://hexus.net/tech/news/cpu/145645-intel-14nm-...
Foeketijn - Wednesday, September 1, 2021 - link
I agree, but like to add Intels 10nm is about as small as the TSMC 7nm. It's like, at what height do you measure the treetrunks width.drothgery - Wednesday, September 1, 2021 - link
Process names have been marketing, not measurements of any actual feature size since they were naming processes by microns. And others have noted that what Intel's calling Intel 7 now is pretty similar density to TSMC N7 for CPUs. I'd also note that TSMC's actual name for what's commonly called their "7nm" is "N7", not "7nm"; no nanometers there either.SystemsBuilder - Tuesday, August 31, 2021 - link
page 1, Golden Cove: A High-Performance Core with AMX and AIA, text under the AMX picture:"AMX uses eight 1024-bit registers for basic data operators" should be 1024 BYTE (or 1KByte) not 1024-bit.
AMX has 8 (row/column) configurable 1KB so called T registers, i.e. the 8 T registers can be configured to use a maximum size of 1KByte each but can also be smaller configured by row and columns parameters (you set tile configuration for each tile with the STTILECFG assembly instruction: i.e. row, columns, BF16/INT8 data type etc).
For more details see AMX section in this document:
https://software.intel.com/content/www/us/en/devel...
SystemsBuilder - Tuesday, August 31, 2021 - link
Cant edit so have to use a comment to clarify: LDTILECFG is used for setting the tile file configuration of all 8 tiles (# of rows and # columns per T register, while Data type is not set by this instruction) while STTILECFG is used for reading out the current tile file configuration and store the read out store that in memory.Ian Cutress - Tuesday, August 31, 2021 - link
My slide from Intel architecture day says 1 Kb = 1 kilo-bit. It literally says that in the slide above the paragraph you're referencing.So either a typo in the slide, or a typo in the AMX doc.
SystemsBuilder - Tuesday, August 31, 2021 - link
It's a type from Intel on the slides that you unfortunately propagated.Should be 1KByte not 1Kb (as in 1 Kbit).
yeah this presentation was not one of intel's finest moment...
just read the full spec ere: https://software.intel.com/content/www/us/en/devel...
There is significantly more detail in the full documentation. all sorts of limitation on number of rows (max 16) for instance which complicates INT8 matrices just as an example... What I would have liked would be to be able to is to fully configure # of rows and # of columns within the 1KByte for a given data type - to fully use each T register 1KByte size. We now need to have rectangular NxM matrix tiles instead of the preferable square NxN matrix tiles (and fit them into 16xM = 1024 bytes, solve for M)- symmetric N x N tiles makes algorithms easier...
SystemsBuilder - Tuesday, August 31, 2021 - link
Ian, to be clear the intel AMX specs in the intel doc:https://software.intel.com/content/www/us/en/devel... spends entire chapter 3 (25 pages) discussing AMX in detail. Stating multiple times that each T register is 1KByte and the whole register files size is 8KByte, also detailing each assembly Instruction etc.Additionally, first rev of this document was published last summer and the latest rev was published in June this year. During this whole time the T register 1KByte size have never changed (but more details have been included with each revision the past 12 months).
Further, glibc and various compliers have already included AMX extensions based on this spec. it would be quite catastrophic for them if intel suddenly cut the T reg size to 1024 bits.
Also, T reg size is not really new news. https://fuse.wikichip.org/news/3600/the-x86-advanc... published a pretty good article already last summer about this (also stating T regs are 1Kbyte).
Lastly, it makes no logical sense to only have 1024bit (128Bytes) tile regs because it is just too small.
Hence, you can safely assume that intel messed up on the slide and adjust your article accordingly. If you still don't believe it, ask intel yourself.
schujj07 - Tuesday, August 31, 2021 - link
One of the rumors for Gen 4 Epyc is 12 channels of DDR5. Now this is just a rumor so it HAS to be taken with a grain of salt. However, if Epyc goes 12 channels, Arm goes 12 channels, and SPR is at 8 channels we could see another instance like Gen 1 & 2 Xeon Scalable not having RAM parity. While going DDR5 does increase the bandwidth, I don't think it does enough to justify not increasing the channels at the same time.JayNor - Wednesday, September 1, 2021 - link
The four stacks of HBM, each with 8 channels DDR should take care of Intel's bandwidth issues for AI operations.schujj07 - Wednesday, September 1, 2021 - link
Bandwidth might be OK for AI with HBM on SPR. One thing to remember is that most of these are going to be running on hypervisors. 6 channel RAM became immediately an issue with Xeon Scalable (especially with their old 1TB/socket limit without L series CPUs where you could only get 768GB RAM). If they only have 8 channels when everyone else has 12 channels you cannot put as much RAM into a system for cheap. Most servers are dual socket and if you are using a hypervisor RAM capacity matters A LOT. If you can have 1.5TB (dual socket with 64GB DIMMs) instead of 1TB (dual socket with 64GB DIMMs) that makes a huge difference for running VMs. All the hosts in my datacenter run with 1TB RAM & dual 32c/64t CPUs. We are not CPU limited but we are RAM limited on each host. While VMware can do RAM compression/ballooning, once you start over provisioning RAM you will start running into performance issues. I've read that after about 10-15% over provisioning on RAM you start getting pretty major performance loss. I've experienced VMs basically stall out (like what happened in the early 2000s when your computer used 512MB RAM and you only had 384MB RAM) at a 50% over provision. Basically depending on the workload bandwidth isn't everything.Spunjji - Tuesday, September 7, 2021 - link
At what cost, though?schujj07 - Tuesday, September 7, 2021 - link
If you have to ask you cannot afford it.Noctrn - Tuesday, August 31, 2021 - link
It seems 99% likely that we're looking at 8 cores per tile for a max of 32 cores in the package. Intel has so far proven incapable of making a single piece of functional silicon with more than 8 large cores on it using anything smaller than 14nm.Kamen Rider Blade - Tuesday, August 31, 2021 - link
All the rumors for Sapphire Rapids is pointing to 14 Cores per Tile in a 4x4 Grid.2x of those nodes are for Vertical/Horizontal Interconnect management.
repoman27 - Tuesday, August 31, 2021 - link
Not exactly. 4x4 grid with 15 CPU core tiles and one IMC tile for the two DDR5 controllers. See my post further up for additional details and link to actual die shots.thestryker - Tuesday, August 31, 2021 - link
Except for the entire Ice Lake Xeon line which scales up to 40 cores...dullard - Tuesday, August 31, 2021 - link
Psst, don't go putting facts into discussions! It throws the rest of us off.Kamen Rider Blade - Tuesday, August 31, 2021 - link
*technically DDR5 puts two 32-bit channels on a single module, but as yet the industry doesn’t have a term to differentiate between a module with one 64-bit memory channel on it vs. a module with two 32-bit memory channels on it. The word ‘channel’ has often been interchangeable with ‘memory slot’ to date, but this will have to change.What about calling them 2x 32-bit Sub-Channels?
TeXWiller - Tuesday, August 31, 2021 - link
JEDEC calls them independent channels, that is 2 channels per module. Hence, an eight DIMM server board has a 16 channel memory system. I don't know why these terms are in flux everywhere.Ian Cutress - Wednesday, September 1, 2021 - link
Because people have been interchanging module and channel for years, and one module = one channel. The fact that DDR5 moves down to 32-bit channels from 64-bit channels means I'm going to be sprinkling the word controller around two be absolutely specific.kpb321 - Tuesday, August 31, 2021 - link
The lower core count versions will certainly be interesting.If the comments are correct and it is 14 cores per tile then you'd have 56 cores max. Certainly you could see them doing 52 and 48 core versions from die harvesting with 1 or 2 cores disabled per tile but the further below that you go the less it makes sense. On the other hand looking at the high level chip diagrams you pretty much have IO going around the entire outside of the cluster of tiles. I'm not sure how much smaller you can make the tiles and still have enough room for all the IO. What's the min core count going to look like? Are there going to be a 16 or 20 core version? Are they still going to use tiles for those or design a different monolithic die?
Spunjji - Friday, September 3, 2021 - link
It would be logical to infer that they're going to need at least one more monolithic design, to allow for designs with fewer tiles with the same number of memory channels.Unless they just leave the lower-core designs with less memory bandwidth, which would be a product segmentation strategy of sorts, I guess?
Silver5urfer - Tuesday, August 31, 2021 - link
Finally something innovative from Intel after years of abandonment of HEDT and Xeon leadership. I would give credit to Intel here because no big little BS scam. Mirroring the die mask design and using separate on top of the design which is using an EMIB on such a large silicon damn it looks super complicated vs AMD chiplet strategy.Now for the worst part, the I/O controller and Memory controller. That is going to be an issue for this design, Intel's mesh had power problems and inferior x86 performance on SKL Xeon and the Ice Lake solved that issue but core problem is AMD solved the EPYC 7000 series Zen based chiplets into a central I/O die and memory controller design eliminating the NUMA node performance hit. With smaller path trace due to EMIB this looks great but still it will have that hopping issues of Zen design based processors.
So a SPR based HEDT LGA4xxx socket is coming but when ? 2022 ? Zen 3 Threadripper Chagall / Genesis Peak is coming this year. And Zen 3 based V-Cache Milan EPYC will be coming next year once the factories start producing them and they will be dripped to AM4 socket processors. SPR needs to prove a lot, Zen 4 is dropping soon with 96C and beastly IPC on top of 12 Channel memory design on TSMC 5N.
IPC is a whole big another equation BUT most important is how the Intel 7 / 10nm design is vs TSMC 7N based EPYC in terms of clock potential and efficiency to performance ratios. Esp the fact that Intel had to cut off the x86 cores into those small SKL inferior crap cores onto the LGA1700 socket to keep up with the rising power consumption of their x86 processor designs. This one maxes out at 56C apparently with each tile at 14C, a big shame all these 14C couldn't make it to the LGA1700 they would have been perfect for the desktop compute, for those stupid thin and light BGA junk sales they axed it and shoved those efficiency designs into the Desktop LGA platform.
Kamen Rider Blade - Tuesday, August 31, 2021 - link
The NUMA Domain is going to be interesting with the 4x Memory Controllers split on each die having to cross domain boundaries.And there appears to be 2x Cross-Tile interconnects on each tile that hold CHA (Caching and Home Agent) and a LLC (Last Level Cache) to handle resolution of coherency across multiple processors.
Each core has a tiny CHA to check against the larger dedicated CHA & LLC to improve scalability according to the number of cores, memory controllers, or the sub-NUMA clustering mode. The address space is interleaved across different CHAs, which act like a single logical agent.
So that's a interesting solution to their data locality issues when multi-threading or having cross core communication.
repoman27 - Tuesday, August 31, 2021 - link
Why do you presume there will be any NUMA domain boundaries on-package? The whole point of going with EMIB and tiles vs. conventional chiplets on an organic substrate is that the EMIBs essentially act as "long wires" bridging the metal layers of the individual dies and extending the mesh without having to pipe the data over a separate link with a SerDes / PHY at each end.JayNor - Tuesday, August 31, 2021 - link
The leaked Emerald Rapids slides show CXL.mem. With the info that Sapphire Rapids doesn't implement CXL.mem, that finally makes sense.CXL.mem isn't mandatory according to servethehome article, "Compute Express Link or CXL What it is and Examples", from May 21,2021
JayNor - Tuesday, August 31, 2021 - link
The CXL.mem feature in Emerald Rapids can be seen in the slides in the adoredtv transcription article, "intel-rapidly-getting-worse", from June 29, 2021.Wereweeb - Tuesday, August 31, 2021 - link
Not what I wanted to see, but EMIB is a pretty cool piece of techBlastdoor - Wednesday, September 1, 2021 - link
Itanium, Optane, Knight's Landing, and Intel's original 10nm plans were all cool pieces of tech, too. Yet they lost to less cool tech like AMD64, flash, GPUs, and TSMC 7nm.After reading all this I'm left wondering if Intel designed this thing to showcase how cool EMIB is rather than EMIB enabling the optimal design (aka, a misalignment between cart and horse).
I look forward to seeing how this super cool glue compares to the combo of AMD"s Elmer's plus a big slab of SRAM plus high yielding multi-use chiplets.
Let's also see how yields and watts look.
wira6444 - Tuesday, August 31, 2021 - link
I though Intel hate "GLUE" ?Kamen Rider Blade - Tuesday, August 31, 2021 - link
They changed their mind.Oxford Guy - Thursday, September 2, 2021 - link
'Truth as convenience' is one of the defining qualities of corporate reasoning.coburn_c - Tuesday, August 31, 2021 - link
Lot of redundancies, guess they were shy to move controllers to separate silicon.. specially 14nm silicon...TeXWiller - Tuesday, August 31, 2021 - link
I forgot to ask about the Alderlake AVX support. I quickly looked at the Intel's presentation and it casted AMX acceleration as enterprise feature in no uncertain terms, which the slides show in this article as well (some journalists got confused at this point). However, the AVX512 support was simply thrown in as a feature of all P-cores without a distinction. Where did you get the information on the status of the consumer AVX512 support?thestryker - Tuesday, August 31, 2021 - link
ADL E-cores do not have AVX512 and there's no effective way currently deal with having two different levels of AVX support in the same processor. If you read through the ADL coverage here on AnandTech you'll see it isn't in any of the slides, and I believe it was Ian who confirmed they're fusing it off for ADL.JayNor - Wednesday, September 1, 2021 - link
I think their dpc++ could handle it just as easily as handling the CPU vs GPU differences. Just need to add another device type.psyclist80 - Tuesday, August 31, 2021 - link
Hey look intel woke up! welcome to 2021! things have changed quite a bit, good luck!diediealldie - Tuesday, August 31, 2021 - link
HBM variants are quite interesting. HBMs can work as an L4 cache, but they have latency penalties due to lower clocks(thermal limit). Quite curious how high latency + high bandwidth will work with various workloads.JayNor - Wednesday, September 1, 2021 - link
Intel commented that HBM wasn't fast enough for all their needs on Ponte Vecchio. They added large L1 and L2 caches and a Rambo Cache that has yet to be fully described.emn13 - Wednesday, September 1, 2021 - link
I'm almost disappointed they're not using gracemont - I suppose that wasn't realistic. In any case, either the tantalizing leaks of gracemonts perf and efficiency are false, or it would be a fantastic server chip - almost 4x the cores, albeit significantly slower ones? Yes please!Calin - Wednesday, September 1, 2021 - link
They would run straight into the ARM powered microprocessors - see https://aws.amazon.com/ec2/graviton/Intel doesn't really want to compete at the low end (many cheap cores), its entire marketing message is "we make fast and expensive microprocessors in the server market".
emn13 - Wednesday, September 1, 2021 - link
I guess, but it's still a shame - and simply having something *available* would discourage people from bothering to switch architecture, you'd think.Then again, perhaps AMD's EPYC is good enough for that, at least for now.
Silver5urfer - Wednesday, September 1, 2021 - link
Knights Landing was Atom based platform. Tons of Atom cores on a giant LGA4xxx socket and it died. The only reason they shoved those pathetic cores onto LGA1700 is to compete in SMT applications against Ryzen without blowing a hole in the power consumption. Once we get the TDP of these new Xeon SPR processors we will get an idea on how a 14C tile operates, they avoided going 14C on the desktop also due to scaling them onto the crappy BGA trashware where their efficiency works to combat Apple M series and AMDs APUs. That is the highest margin for Intel apart from Datacenter business of Xeon. Their desktop is lowest priority of all 3. So they do not care about that but they wasted ton of cash on that stupid Intel Thread Director.drothgery - Wednesday, September 1, 2021 - link
They've made some "lots of Atom cores" processors before; it wouldn't be too shocking if they did again. Though swapping out 14X Golden Cove tiles for 56X Gracemont tiles to build a 224-core monster seems ... improbable.Duncan Macdonald - Wednesday, September 1, 2021 - link
Power consumption ?Recent Intel CPUs have been power hogs when the frequency is raised sufficiently to start to compete with the AMD CPUs. (So much so that Cloudfare gave up on Intel CPUs in its recent servers (See https://www.theregister.com/2021/09/01/cloudflare_... for more details).)
For any large scale deployment of servers, the power consumption of the servers is as important as their performance. A quote from a Cloudflare engineer "Although Intel's chips were able to compete with AMD in terms of raw performance, the power consumption was several hundred watts higher per server – that's enormous."
Ian Cutress - Wednesday, September 1, 2021 - link
Product information not disclosed. This was an architecture presentation.TristanSDX - Wednesday, September 1, 2021 - link
no benches :(edzieba - Wednesday, September 1, 2021 - link
"This means that if SPR is going to offer versions with fewer cores, it is going to either create dummy tiles without any cores on them, but still keep the PCIe/DDR5 as required, or quite simply those lower core counts are going to have fewer memory controllers."Or these mirrored die pairs will have very aggressive stratified binning.
msroadkill612 - Wednesday, September 1, 2021 - link
Intel rely on short memories.When Zen appeared, they were notorious for forcing multi socket servers (gouging) on folks who really only needed single sockets - the exact opposite of aiming for better processor interconnects.
They know how vital tdp is as a metric.
If their solution is so competitive - why does it push the maximum acceptable tdp limits? It smacks of desperation.
wrkingclass_hero - Thursday, September 2, 2021 - link
It looks like they pulled out a dusty chip from under the fridge for that press pickwrkingclass_hero - Thursday, September 2, 2021 - link
*picAlexvrb - Sunday, September 5, 2021 - link
"The word ‘channel’ has often been interchangeable with ‘memory slot’ to date, but this will have to change."I get what you're saying, but as-worded that hasn't been true like... ever. I'm typing this on a newfangled dual channel system with 4 (count em, 4!) memory slots. I know it would be confusing to say you've got 16 x 32 bit channels for 8 or 16 modules, people might think you're only enabling half the channels with 8 modules. You could always specify double-channel modules or something of that nature.
JayNor - Tuesday, September 7, 2021 - link
"One of the critical deficits Intel has to its competition in its server platform is core count"If they chose to go to 256 Gracemont cores in a package, would the competition refer to them as cores?
nhgzf - Saturday, September 11, 2021 - link
Set to launch in 2022, Sapphire Rapids will be Intel’s first CPU product to take advantage of a multi-die architecture". Doesn't that comment forget the Pentium D, Kentsfield, etc?..Ian Cutress - Sunday, September 12, 2021 - link
Your quote left out the most pertinent word: 'modern'.OstensiblyRandom - Tuesday, October 5, 2021 - link
Re. LLC and retaining 8 memory channels - I surmise that unnecessary UPI components are reconfigurable as... Fpgaish perhaps?Also I'd guess the hbm versions wouldn't need more pins, just a fat package, less pins in the sans-ddr version.
Also can we have more focus, a dedicated article even, on the workstation (SP) variant, its been gone too long - can't let intel forget they haven't released a processor that interests actual computing-professionals for years.