Intel Offers More Cascade Lake-AP Performance Numbers
by Ian Cutress on November 11, 2018 6:15 PM EST- Posted in
- Enterprise
- Intel
- HPC
- Enterprise CPUs
- Cascade-AP
One of the announcements from last week involved Intel and its new Cascade Lake Advanced Performance category of processors to launch next year. These new processors will be based on combining two 24-core Cascade Lake-SP processors on a single package substrate to offer a single socket 48-core option with a total of twelve memory channels. The Cascade Lake-AP parts are going to be launched next year, and until then Intel is putting out some internal benchmark numbers.
Vendor Benchmark Results
When we are this far away from a product launch, all benchmark numbers should be taken with a grain of salt. This goes doubly so for vendor supplied benchmarks. However, Intel is on the warpath to promote what it sees as a new product family within its portfolio, even if it is only set to come out next year.
At the announcement last week, Intel offered Linpack and Stream Triad as two main high-performance metrics as comparison. Today Intel is also offering more ‘real world’ metrics. These metrics are, to quote Intel, ‘estimates based on pre-production hardware’. This means that the hardware is not ready yet, and these are values based on the engineering samples running but extrapolated to an expected benchmark value. Add another dump truck of salt on these numbers.
Intel’s official list of results are as follows:
Intel's Benchmark Numbers for 2S 48 Core Cascade Lake-AP |
||
Benchmark Type | Score vs 2S EPYC 7601 | |
*These numbers were created by Intel | ||
Linpack | Numerical Linear Algebra | 3.4x |
Stream Triad | Memory Bandwidth | 1.3x |
MILC | Quantum Chromodynamics | 1.5x |
WRF | Weather Forcasting | 1.6x |
OpenFOAM | Computational Fluid Dynamics | 1.6x |
NAMD (APOA1) | Molecular Dynamics | 2.1x |
YASK (ISO 3DFD) | HPC Kernel Tuning | 3.1x |
The slide with this data is in the gallery below.
In each case, Intel is comparing a dual-socket Cascade Lake-AP system with a dual socket EPYC 7601 system. Intel’s information slides go through how it set up all of its AMD systems in detail, however it does not disclose how the Cascade Lake-AP systems are set up by comparison, presumably as to not disclose any additional set-up numbers.
For the most part, we don’t put much stock into vendor supplied benchmark numbers. It’s easy for a vendor to claim a multiple when doubling particular compute resources, but when it comes to real world tests, companies like Intel have to try and promote its future products to potential customers. This is what this is. However, no matter how many numbers come out, these are impossible to verify independently. Wait until the AnandTech review, of course.
Intel also disclosed a number of ecosystem partners who are getting ready to deploy Cascade Lake-AP, as well as an offical declaration of the Cascade Lake-AP deployment at HLRN.
Cascade Lake-AP is set to be launched alongside the Cascade Lake-SP in the first part of 2019, although Intel states that Cascade Lake-SP will ship for revenue in 2018. This week we are at the Supercomputing trade show - hopefully there will be a demo somewhere that we’ll be able to see and talk about.
52 Comments
View All Comments
Spunjji - Monday, November 12, 2018 - link
What a bargain - the product provided its own viking funeral at no additional cost. :Dmuziqaz - Monday, November 12, 2018 - link
This is 64 AMD cores vs 96 intel cores, since SMT is disabled in 7601.yannigr2 - Monday, November 12, 2018 - link
In Linpack testing, Intel decided that disabling SMT on EPYC was the correct thing to do.brakdoo - Monday, November 12, 2018 - link
So, AMD and Intel are pushing parallel FP benchmarks.Someone tell them that CPUs already lost this battle (despite AVX-512 for Intel) as HPC users almost always use accelerators for these use cases like CFD and weather forecast. Just look at the TOP500...
Spunjji - Monday, November 12, 2018 - link
There are still some cases where splitting the data out to separate accelerators takes more time (either in programming effort or just in terms of waiting around) than just performing the calculations on CPU, but you're right that the number of scenarios these performance comparisons apply to is shrinking rapidly.saylick - Monday, November 12, 2018 - link
This does not bode well for Intel. 2x48 only manages to beat out 2x32 by 1.5x to 1.6x for non-AVX heavy workloads and for AVX-heavy workloads, the lead jumps to 3.4x. The fact that Intel is pitting 50% more cores against AMD in this comparison already accounts for that 1.5x to 1.6x performance uplift... With EPYC 2, AMD touts 2x the perf/W and 2x the cores AND with AT's interview with Mark Papermaster, AMD will support AVX2 without backing off on clocks. In a 2S head to head, Intel's 2x48 vs AMD's 2x64 will show AMD having the performance lead across the board at a power consumption level similar to a 2S EPYC 1 system.edzieba - Monday, November 12, 2018 - link
The interesting slides are not the performance numbers, but the last two: vendors designing systems around the chips. Checking the latest TOP500, there are not yet AMD design wins listed (for Epyc, much less Rome), and none that I am aware of under construction or design (though that could have changed).ilt24 - Monday, November 12, 2018 - link
@edzieba..."Checking the latest TOP500, there are not yet AMD design wins listed (for Epyc..."#38 on the newest TOP500 list contains 5120 - EPYC 7501 (32 core) processors...note the processors are listed as Hygon Dhyana Epyc 7501 processors, which comes from a JV between AMD and a company created by the Chinese government.
edzieba - Tuesday, November 13, 2018 - link
Ah, I'd done a ctrl+f for AMD. I'd forgotten about the Hygon and THATIC legal dodges.RogerAndOut - Tuesday, November 13, 2018 - link
WOW, Intel is spinning in just so many ways.- Intel is having to commit what in many ways its future 4-way server design to compete with AMD's current 2-way server. This is not a great story for the majority of deployments that are 1 or 2 socket systems and 4 socket systems are meant to be a high-profit area for Intel.
- Intel disabled SMT on the AMD system for the first test and many of the other tests indicate that the number of threads running was set at 1 per core. So SMT was not used where possible.
- Intel used their C/C++ compiler to compile the test suites where possible. There is no indication that they used the AMD Optimized compiler for the AMD systems.
- The header text states that microprocessor-dependent optimizations and non-specific optimizations were only carried out for the Intel processors.
All in all the info seems to have come from the same marketing team that release complete spin for the i9-990K and before that the consumer version of the 28 chore Xeon Platinum 8180.