Conclusions & Thoughts on Dense Compute

Truth be told, when I was in discussions with Supermicro about reviewing one of its Ice Lake systems, I wasn’t sure what to expect.  I spoke to my contact at the company about sending a system that is expected to be a popular all-around enterprise system, one that could entertain many of Supermicro’s markets, and the SYS-120U-TNR fits that bill with the understanding that companies are also requesting denser environments.

The desire to move from previously standard 2U designs to 1U designs, even for generic dual socket systems, seems to be a feature of this next generation of enterprise deployments. Data centers and colocation centers have built infrastructure to support high-powered racks for AI – those enterprises that require the super dense AI workloads now invest in 5U systems consuming 5kW+, enough that you can’t even fill a 42U rack without going above standard rack power limits unless you have high power infrastructure in place. The knock-on effect of having better colo and enterprise infrastructure is allowing customers that use generic all-round off-the-shelf systems to reduce those racks of 2U infrastructure in half. This can also be combined with any benefit of moving from an older generation of processor to the new generation.

This causes a small issue for those of us that review servers every now and then: a modern dual socket server in a home rack with some good CPUs can no longer be tested without ear protection. Normally it would be tested in a lower-than-peak fan mode, without additional thermal assistance, however these systems require either fans at full or some additional HVAC to even run standard tests. A modern datacenter enables these systems to run as loud as they need, and the cooling environment is optimized for performance density regardless of the fan speed. Enterprise customers are taking advantage of this at scale, and that’s why companies like Supermicro are designing systems like the SYS-120U-TNR to meet those needs.

Dense Thoughts on Compute

What I think Supermicro is trying to do here with the SYS-120U-TNR is to cater for the biggest portion of demand in a variety of use cases. This system could be used as a single CPU caching tier, it could be a multi-tiered database processing hub, it could be used for AI acceleration in both training and inference, add in a double slot NVIDIA GPU with a virtualization license and you could run several remote workers with CUDA access, or with multiple FPGAs it could be a hub for SmartNIC offload or development. I applaud the fact that Supermicro have quite capably built an all-round machine that can be constructed to cater to so many markets.

One slightly fallback from my perspective is the lack of a default Network interface – even a simple gigabit connection – without an add-in card. Supermicro won’t ship the system without an add-in NIC anyway, however users will either have to add in their own PCIe solution (taking up a slot) or rely on one of Supermicro’s Ultra Riser networking cards drawing PCIe lanes from the processor. We could state that Supermicro’s decision allows for better flexibility, especially when space at the rear of a system is limited, but I’m still of the opinion that at least something should be there, and hanging off of the chipset.

On the CPU side of things, as we noted in our Intel 3rd Generation Xeon Scalable Ice Lake review, the processors themselves offer an interesting increase in generational performance, as well as key optimization points for things like AVX-512, SGX enclaves, and Optane DC Persistent Memory. The move up to PCIe 4.0, eight lanes of DDR4-3200 memory, and focusing on an optimized software stack do well as plus points for the product, but if your workload falls outside of those optimizable use cases, AMD equivalent offerings seem to have more performance for the same cost, or in some instances a lower cost and lower power.

The Xeon Gold 6330s we are testing today are the updates to the 28-core Xeon Gold 6258R from the previous generation, running at the same power, but half the cost and much lower frequencies. There’s a trade-off there as the Xeon 6330s aren’t as fast, but consume the same power – by charging half as much for the processors, Intel is trying to change the TCO equation to where it needs to be for their customers. The Ice Lake Xeon Gold 6348 are closer in frequency to the 6258R (2.6G base vs 2.7G base), and are closer in price ($3072 list vs $3950), but with a lower frequency they are rated to a higher TDP (235W vs 205W). In our Ice Lake review, the new 8380 vs older 8280 won as the power was higher, there were more cores, and we saw an uplift in IPC. The question is now more in the mid-range: while Intel struggles to get its new CPUs to match the old without changing list pricing, AMD allows customers to change from dual socket to single socket, all while increasing performance and reducing power costs.

This is somewhat inconsequential for today’s review, in that Supermicro’s system caters to the customers that require Intel for their enterprise infrastructure, regardless of the processor performance.  The SYS-120U-TNR is versatile and configurable for a lot of markets, ripe for an ‘off-the-shelf’ system deployment.

System Results
Comments Locked

53 Comments

View All Comments

  • SSNSeawolf - Monday, July 26, 2021 - link

    I'm curious why you say AMX is like AVX-8192. My understanding is that AMX is essentially a configurable fused multiply-add accelerator, with the added bonus of some configuration registers. However, I'm not an AI guy so I welcome corrections.
  • mode_13h - Monday, July 26, 2021 - link

    > I'm curious why you say AMX is like AVX-8192.

    Because that's how big the registers are. 1 kB each (there are 8 of them, BTW). As for the configurable part, it's true that operations don't have to use the entire register.

    I'm not saying it *is* AVX-8192. Just that you could sort of look at it that way. The point was only to tie it into the lineage of what came before. For anything beyond that, you'll want to dig into the specifics and understand it for what it *is*.
  • mode_13h - Sunday, July 25, 2021 - link

    If there's one thing Intel knows how to do, it's more of what they've done before!
  • Foeketijn - Thursday, July 22, 2021 - link

    Power and cooling is not cheap in a colo. Using 300W more for the same performance will set you back 1000 bucks a year easily.
  • mode_13h - Thursday, July 22, 2021 - link

    Yeah, I'd have expected power-efficiency to be the top priority, followed by density.
  • Spunjji - Monday, July 26, 2021 - link

    Ouch!
  • mode_13h - Thursday, July 22, 2021 - link

    Ian, the AVX 3DPM benchmark is concerning me. Given the grossly asymmetric optimization for AVX-512 vs. AVX2, I think it's not a good performance characterization for AVX2 vs. AVX-512 CPUs.

    If the AVX2 path could be optimized to a similar degree, then I think it would make sense to use it in that way. Unless/until that happens, I think you should only use it to compare like-for-like CPUs (i.e. AVX2 vs AVX2; AVX-512 vs AVX-512).

    On a related note, please post the source somewhere like github, so that we actually see what it's measuring and potentially have a go at optimizing the AVX2 path, ourselves.
  • 29a - Thursday, July 22, 2021 - link

    I've also been complaining about ego mark forever and now they added that terrible AI benchmark to the lineup which they readily admit is bad data.
  • mode_13h - Thursday, July 22, 2021 - link

    He should just put it up on github and see what people can do with it. Plus, somebody might optimize it for ARM, too. He's already shared it with Intel and AMD, so what's the big deal?
  • Dolda2000 - Thursday, July 22, 2021 - link

    I don't think there's anything particularly wrong with that. It may be disproportionate to other benchmarks, but if all benchmarks scaled the same, there'd be no point in having more than one at all. It's a real-world workload (custom in-house programs are perhaps the most real-world workloads there are), and it does demonstrate the fact that some programs really benefit by AVX-512.

    Realistically, I don't think it should've been shared with Intel and AMD (it would've arguably been better if it were "pristine"), but given that that has been done, I'd agree there's no point to not making it public any longer. That being said, I'm not sure the point should be to microoptimize it to the ends of the world, or it wouldn't be a realistic workload any longer.

Log in

Don't have an account? Sign up now