The Intel Xeon D Review: Performance Per Watt Server SoC Champion?
by Johan De Gelas on June 23, 2015 8:35 AM EST- Posted in
- CPUs
- Intel
- Xeon-D
- Broadwell-DE
Broadwell in a Server SoC
In a nutshell, the Xeon D-1540 is two silicon dies in one highly integrated package. Eight 14 nm Broadwell cores, a shared L3-cache, a dual 10 gigabit MAC, a PCIe 3.0 root with 24 lanes find a home in the integrated SoC whereas in the same package we find four USB 3.0, four USB 2.0, six SATA3 controllers and a PCIe 2.0 root integrated in a PCH chip.
The Broadwell architecture brings small microarchitectural improvements - Intel currently claims about 5.5% higher IPC in integer processing. Other improvements include slightly lower VM exit/enter latencies, something that Intel has been improving with almost every recent generation (excluding Sandy Bridge).
Of course, if you are in the server business, you care little about all the small IPC improvements. Let us focus on the large relevant improvements. The big improvements over the Xeon E3-1200 v3 are:
- Twice as many cores and threads (8/16 vs 4/8)
- 32 GB instead of 8 GB per DIMM supported and support for DDR4-2133
- Maximum memory capacity has quadrupled (128 GB vs 32 GB)
- 24 PCIe 3.0 lanes instead of 16 PCIe 3.0 lanes
- 12 MB L3 rather than 8 MB L3
- No separate C22x chipset necessary for SATA / USB
- Dual 10 Gbit Ethernet integrated ...
And last but not least, RAS (Reliability, Availability and Servicability) features which are more similar to the Xeon E5:
The only RAS features missing in the Xeon D are the expensive ones like memory mirroring. Those RAS features a very rarely used, and The Xeon D can not offer them as it does not have a second memory controller.
Compared to the Atom C2000, the biggest improvement is the fact that the Broadwell core is vastly more advanced than the Silvermont core. That is not all:
- Atom C2000 had no L3-cache, and are thus a lot slower in situation where the cores have to sync a lot (databases)
- No support for USB 3 (Xeon D: four USB 3 controllers)
- As far as we know Atom C2000 server boards were limited to two 1 Gbit PHYs (unless you add a separate 10 GBe controller)
- No support for PCIe 3.0, "only" 16 PCIe Gen2 lanes.
There are more subtle differences of course such as using a crossbar rather than a ring, but those are beyond the scope of this review.
90 Comments
View All Comments
JohanAnandtech - Wednesday, June 24, 2015 - link
Hi Patrick, the base clock of our chip is 2 GHz, not 1.9 GHz as the one pre-production version that we got from Intel. I have to check the turboclocks though, but I do believe we have measured 2.6 GHz. I'll doublecheck.pjkenned - Wednesday, June 24, 2015 - link
Awesome! Our ES ones were 1.9GHz.Chrisrodinis1 - Tuesday, June 23, 2015 - link
For comparison, this server uses Xeon's. It is the HP Proliant BL460c G9 blade server: https://www.youtube.com/watch?v=0s_w8JVmvf0MrDiSante - Wednesday, June 24, 2015 - link
Why use only -O2 when compiling the benchmarks? I would imagine that in order to squeeze out every last bit of performance, all production software is compiled with all optimizations turned up to 11. I noticed that their github uses -O2 as an example - is it that TinyMemBenchmark just doesn't play nice with -O3?JohanAnandtech - Wednesday, June 24, 2015 - link
The standard makefile had no optimization whatsoever. If you want to measure latency, you do not want maximum performance but rather accuracy, so I played it safe and used -O2. I am not convinced that all production software is optimized with all optimization turned on.diediealldie - Wednesday, June 24, 2015 - link
Intel seems disARMing them... X-Gene 2 doesn't look so promising, as they'll have to fight mighty Skylake-based Xeons, not Broadwell ones.Thanks for great article again.
jfallen - Wednesday, June 24, 2015 - link
Thanks Johan for the great article. I'm a tech enthusiast, and will never buy or use one of these. But it makes great reading and I appreciate the time you take to research and write the article.Regards
Jordan
JohanAnandtech - Wednesday, June 24, 2015 - link
Happy to read this! :-)TomWomack - Wednesday, June 24, 2015 - link
This looks very much consistent with my experience; the disconcertingly high idle power (I looked at the board with a thermal camera; the hot chips were the gigabit PHY, the inductors for the power supply, and the AST2400 management chip), the surprisingly good memory performance, the fairly hot SoC (running sixteen threads of number-crunching I get a power draw of 83W at the plug) and the generally pretty good computation.I'm not entirely sure it was a better buy for my use case than a significantly cheaper 6-core Haswell E - Haswell E is not that hot, electricity not that expensive, and from my supplier the X10SDV-F board and memory were £929 whilst Scan get me an i7-5820K board, CPU and memory for £702. And four-channel DDR4 probably is usefully faster than two-channel for what I do.
I quite strongly don't believe in server mystique - the outbuilding is big enough that I run out of power before I run out of space for micro-ATX cases, and I am lucky enough to be doing calculations which are self-checking to the point that ECC is a waste of money.
JohanAnandtech - Wednesday, June 24, 2015 - link
Hi Tom, I believe we saw up to 90 Watt at the wall when running OpenFOAM (10 Gbit enabled). It is however less relevant for such a chip which is not meant to be a HPC chip as we have shown in the article. HPC really screams for an E5.