CPU Tests: Microbenchmarks

Core-to-Core Latency

As the core count of modern CPUs is growing, we are reaching a time when the time to access each core from a different core is no longer a constant. Even before the advent of heterogeneous SoC designs, processors built on large rings or meshes can have different latencies to access the nearest core compared to the furthest core. This rings true especially in multi-socket server environments.

But modern CPUs, even desktop and consumer CPUs, can have variable access latency to get to another core. For example, in the first generation Threadripper CPUs, we had four chips on the package, each with 8 threads, and each with a different core-to-core latency depending on if it was on-die or off-die. This gets more complex with products like Lakefield, which has two different communication buses depending on which core is talking to which.

If you are a regular reader of AnandTech’s CPU reviews, you will recognize our Core-to-Core latency test. It’s a great way to show exactly how groups of cores are laid out on the silicon. This is a custom in-house test built by Andrei, and we know there are competing tests out there, but we feel ours is the most accurate to how quick an access between two cores can happen.

When we first reviewed the 10-core Comet Lake processors, we noticed that a core (or two) seemed to take slightly longer to ping/pong than the others. We see the same pattern here again with the final core.

Frequency Ramping

Both AMD and Intel over the past few years have introduced features to their processors that speed up the time from when a CPU moves from idle into a high powered state. The effect of this means that users can get peak performance quicker, but the biggest knock-on effect for this is with battery life in mobile devices, especially if a system can turbo up quick and turbo down quick, ensuring that it stays in the lowest and most efficient power state for as long as possible.

Intel’s technology is called SpeedShift, although SpeedShift was not enabled until Skylake.

One of the issues though with this technology is that sometimes the adjustments in frequency can be so fast, software cannot detect them. If the frequency is changing on the order of microseconds, but your software is only probing frequency in milliseconds (or seconds), then quick changes will be missed. Not only that, as an observer probing the frequency, you could be affecting the actual turbo performance. When the CPU is changing frequency, it essentially has to pause all compute while it aligns the frequency rate of the whole core.

We wrote an extensive review analysis piece on this, called ‘Reaching for Turbo: Aligning Perception with AMD’s Frequency Metrics’, due to an issue where users were not observing the peak turbo speeds for AMD’s processors.

We got around the issue by making the frequency probing the workload causing the turbo. The software is able to detect frequency adjustments on a microsecond scale, so we can see how well a system can get to those boost frequencies. Our Frequency Ramp tool has already been in use in a number of reviews.

The Core i9-10850K ramps up extremely quickly from idle to peak turbo, in the region of about 5 milliseconds. This is faster than the 16 ms we typically observe.

Power Consumption CPU Tests: Office and Science
POST A COMMENT

127 Comments

View All Comments

  • Hulk - Monday, January 4, 2021 - link

    I loved the article. Well-written, very informative, and entertaining. Also little is ever written when it comes to binning. It's great to hear Ian's thoughts on this and the lengths Intel has been going to in order to stay competitive.
    Ian presented the facts of the case. We are the jury and make our own decisions.
    Reply
  • simpleinhibition - Monday, January 4, 2021 - link

    This review is only 6 months after launch. I remember a time when anandtech spent more time doing launch day articles and less time tweeting Reply
  • mrvco - Monday, January 4, 2021 - link

    Very diplomatic review, but Intel has become the Dodge of CPUs. Reply
  • Everett F Sargent - Monday, January 4, 2021 - link

    "For v2.1, we also have a fully optimized AVX2/AVX512 version, which uses intrinsics to get the best performance out of the software."

    Hmm, err, none of the CPU's in this review support any of the AVX-512 instruction set afaik.

    Pointless to compile explicit AVX-512 instructions or use the AVX-512 compiler flag. We know this because compiling something on an AVX-512 aware CPU will work on an AVX-512 machine but will surely crash on a non-AVX-512 CPU. So the best you can say in this review is that AVX2 was enabled as all of the tested CPU's support AVX2.

    Now when Rocket Lake comes out then you have an AVE-512 aware CPU. I really don't care what you all do. But if you are going to use/build custom code then use it in a pure AVE-512 compiled code. Four word versus eight word vectors (assuming 64-bit FP code). That then isolates the AVX-512 advantage which should be ~2X faster (eight/four) afaik.
    Reply
  • Everett F Sargent - Monday, January 4, 2021 - link

    Oh and the CPU speeds would have the same for all tests. Otherwise you will have to factor in those different CPU clocks. Yes to the slower clocks for AVX2/AVX-512 instructions as per the MHz offsets versus non-vectored code. Reply
  • TeXWiller - Monday, January 4, 2021 - link

    Sorry to nit-pick, Ian, but the original definition of the dark silicon was the area of the chip for which there is not enough power or thermal budget to power at the same time as the rest of the chip, instead that of structures that are purposefully added to improve thermal management. The paragraph makes the distinction unclear in my opinion. Reply
  • anarfox - Monday, January 4, 2021 - link

    I bit of an overreaction in the comments here. I have one of these with a noctua nh-d15 and it has no problem keeping it cool. And it's not like it have to ramp up the fans either. Is really quiet.

    An amd cpu might be a better choice of you can get one. But that's not an easy task.
    Reply
  • Oxford Guy - Monday, January 4, 2021 - link

    ‘While these comments make sense, ultimately very few users apply memory profiles (either XMP or other) as they require interaction with the BIOS, and most users will fall back on JEDEC supported speeds’

    Hogwash.

    Ultimately, very few non-enthusiasts read Anandtech. So, citing the people who are not your audience is plain fallacious.

    Secondly, no one needs to go to JDEC to gain stability, nor wants to, unless they’re in ECC land. If they didn’t bother to read their motherboard vendor’s supposed RAM list that shouldn’t be a ball and chain around our necks.

    Want JDEC? Fine. Do two rounds of tests. Otherwise, stick with the actual sweet spot in terms of price and performance. That is never JDEC.
    Reply
  • Oxford Guy - Monday, January 4, 2021 - link

    JEDEC, rather. Not even spelling the acronym is par for the course given how irrelevant it is for enthusiasts.

    As for ‘supposed’, that’s auto-defect.
    Reply
  • Dug - Monday, January 4, 2021 - link

    "‘While these comments make sense, ultimately very few users apply memory profiles (either XMP or other) as they require interaction with the BIOS, and most users will fall back on JEDEC supported speeds’"

    Ummmm..... no.
    I guess you guys haven't bought a computer in a long time from a vendor. Or even realize that people that do make their own, do apply it because every single guide on youtube, every tech site, every how to blog, shows it. So your assumption is just that, and not realistic.
    Reply

Log in

Don't have an account? Sign up now