Fundamental Windows 10 Issues: Priority and Focus

In a normal scenario the expected running of software on a computer is that all cores are equal, such that any thread can go anywhere and expect the same performance. As we’ve already discussed, the new Alder Lake design of performance cores and efficiency cores means that not everything is equal, and the system has to know where to put what workload for maximum effect.

To this end, Intel created Thread Director, which acts as the ultimate information depot for what is happening on the CPU. It knows what threads are where, what each of the cores can do, how compute heavy or memory heavy each thread is, and where all the thermal hot spots and voltages mix in. With that information, it sends data to the operating system about how the threads are operating, with suggestions of actions to perform, or which threads can be promoted/demoted in the event of something new coming in. The operating system scheduler is then the ring master, combining the Thread Director information with the information it has about the user – what software is in the foreground, what threads are tagged as low priority, and then it’s the operating system that actually orchestrates the whole process.

Intel has said that Windows 11 does all of this. The only thing Windows 10 doesn’t have is insight into the efficiency of the cores on the CPU. It assumes the efficiency is equal, but the performance differs – so instead of ‘performance vs efficiency’ cores, Windows 10 sees it more as ‘high performance vs low performance’. Intel says the net result of this will be seen only in run-to-run variation: there’s more of a chance of a thread spending some time on the low performance cores before being moved to high performance, and so anyone benchmarking multiple runs will see more variation on Windows 10 than Windows 11. But ultimately, the peak performance should be identical.

However, there are a couple of flaws.

At Intel’s Innovation event last week, we learned that the operating system will de-emphasise any workload that is not in user focus. For an office workload, or a mobile workload, this makes sense – if you’re in Excel, for example, you want Excel to be on the performance cores and those 60 chrome tabs you have open are all considered background tasks for the efficiency cores. The same with email, Netflix, or video games – what you are using there and then matters most, and everything else doesn’t really need the CPU.

However, this breaks down when it comes to more professional workflows. Intel gave an example of a content creator, exporting a video, and while that was processing going to edit some images. This puts the video export on the efficiency cores, while the image editor gets the performance cores. In my experience, the limiting factor in that scenario is the video export, not the image editor – what should take a unit of time on the P-cores now suddenly takes 2-3x on the E-cores while I’m doing something else. This extends to anyone who multi-tasks during a heavy workload, such as programmers waiting for the latest compile. Under this philosophy, the user would have to keep the important window in focus at all times. Beyond this, any software that spawns heavy compute threads in the background, without the potential for focus, would also be placed on the E-cores.

Personally, I think this is a crazy way to do things, especially on a desktop. Intel tells me there are three ways to stop this behaviour:

  1. Running dual monitors stops it
  2. Changing Windows Power Plan from Balanced to High Performance stops it
  3. There’s an option in the BIOS that, when enabled, means the Scroll Lock can be used to disable/park the E-cores, meaning nothing will be scheduled on them when the Scroll Lock is active.

(For those that are interested in Alder Lake confusing some DRM packages like Denuvo, #3 can also be used in that instance to play older games.)

For users that only have one window open at a time, or aren’t relying on any serious all-core time-critical workload, it won’t really affect them. But for anyone else, it’s a bit of a problem. But the problems don’t stop there, at least for Windows 10.

Knowing my luck by the time this review goes out it might be fixed, but:

Windows 10 also uses the threads in-OS priority as a guide for core scheduling. For any users that have played around with the task manager, there is an option to give a program a priority: Realtime, High, Above Normal, Normal, Below Normal, or Idle. The default is Normal. Behind the scenes this is actually a number from 0 to 31, where Normal is 8.

Some software will naturally give itself a lower priority, usually a 7 (below normal), as an indication to the operating system of either ‘I’m not important’ or ‘I’m a heavy workload and I want the user to still have a responsive system’. This second reason is an issue on Windows 10, as with Alder Lake it will schedule the workload on the E-cores. So even if it is a heavy workload, moving to the E-cores will slow it down, compared to simply being across all cores but at a lower priority. This is regardless of whether the program is in focus or not.

Of the normal benchmarks we run, this issue flared up mainly with the rendering tasks like CineBench, Corona, POV-Ray, but also happened with yCruncher and Keyshot (a visualization tool). In speaking to others, it appears that sometimes Chrome has a similar issue. The only way to fix these programs was to go into task manager and either (a) change the thread priority to Normal or higher, or (b) change the thread affinity to only P-cores. Software such as Project Lasso can be used to make sure that every time these programs are loaded, the priority is bumped up to normal.

Intel Disabled AVX-512, but Not Really Power: P-Core vs E-Core, Win10 vs Win11
Comments Locked

474 Comments

View All Comments

  • web2dot0 - Thursday, November 4, 2021 - link

    Who in their right mind thinks Intel is moving in the right direction?

    250W TDP?!?!?

    Apple came out with their 30W TDP that can rival desktop CPU parts.

    Comically embarrassing.
  • geoxile - Thursday, November 4, 2021 - link

    Apple rivals desktop CPUs in SPECINT, which clearly loves memory bandwidth and cache. DDR5 alone boosted ADL's score in SPECINT MT by 33% from a DDR4 configuration. In Cinebench and geekbench the m1 pro and max are closer to workstation laptop processors. We'll see what happens with ADL mobile.
  • Ppietra - Thursday, November 4, 2021 - link

    The 12600K has basically the same Geekbench score of the M1 Max, and yet its 10 cores consume 3 times more than the M1.
    On the 12900K just using the 8 E-cores consumes more than the M1 Max using the CPU at peak power. So we shouldn’t expect big miracles in mobile, unless Intel starts selling 90W chips.
    As for Cinebench, it will be difficult for Apple Silicon to come out on top until Apple implements some sort of Hyperthreading, Cinebench takes good advantage from it.
  • geoxile - Thursday, November 4, 2021 - link

    The H55 segment will offer 8+8 at 45W and H45 will offer 6+8 at 35W, no need to compare the 12600k. We have models for how mobile uses power compared to desktop. They retain 80-90% of the performance at 1/3 to 1/4 the sustained power. 5900HS @ 35W cTDP (35-40W actual power) has around 85% the performance of the 5800X @110-120W in cinebench. The 11980HK at 45W has almost 90% the performance of the 11700k at 130-150W (non-AVX) in geekbench 5.
  • Ppietra - Thursday, November 4, 2021 - link

    Closer to 15% drop in Geekbench, and probably at much higher package peak power draw than 45W, considering what Anandtech has measured for the 11980HK in Multithreaded tasks (around 75W).
  • geoxile - Thursday, November 4, 2021 - link

    The 11980HK respects the configured PL/cTDP for the most part. It only hits 75W during the initial cold start. It uses 65W sustained power when configured to PL 65 and 45W when configured to 45W
    https://www.anandtech.com/show/16680/tiger-lake-h-...
    I screwed up using tom's results for geekbench, apparently it is at PL 65 unlike Anand's for the TGL test system. But it also scores 9254 vs anandtech's 11700k scoring 9853, so within around 94% performance of its desktop counterpart. I've seen some higher scores on GB itself but using "official" sources that's pretty close to 2x more efficient. I can't seem to find any real PL 45 results for GB5. Point is, scaling down isn't a problem, and ADL will no doubt scale down better thanks to E-cores and just overall better efficiency based on what we've already seen, like gaming efficiency according to igorslab and PL 150 making barely any difference in performance compared to PL 220. I think Intel is in a unique position since AMD doesn't have small cores anymore.
  • Ppietra - Friday, November 5, 2021 - link

    What you are failing to realize is that Geekbench, due to its short tests nature, ends up spending a lot of time at peak performance and not at sustained performance.
    And no, the 11700k doesn’t score 9853 - you are looking at averages on the Geekbench site which are not reliable to make this sort of comparison. Notebookcheck geekbench score is close to 11300, while the 11980HK scores closer to 9700.
  • geoxile - Friday, November 5, 2021 - link

    Geekbench runs for a few minutes afaik. The peak you're describing only lasts for a split second and quickly falls down to the sustained power over a few seconds to 30 seconds. And no, I'm not looking at averages from geekbench, I literally told you I'm using anand's score for the 11700k and tom's score for TGL mobile. https://www.anandtech.com/bench/CPU-2020/2972
  • Ppietra - Friday, November 5, 2021 - link

    geoxile, Geekbench is a bunch of discreet tests with pauses in between.
    The value that you used is almost exactly the average in the Geekbench database, and we know that the 11700 gets much higher than that. You can also check that Anandtech never showed Geekbench results with that CPU in any of its reviews of the 11700. Don’t know why that value is there.
  • geoxile - Friday, November 5, 2021 - link

    Describing a context switch to load the next bench as "pauses" is borderline gaslighting. It's a memory workload, not idling. PL2 on the 11980HK lasts for seconds from cold start at PL1 45.

    It's almost or it's exact. Anandtech lists those scores and I have no reason to doubt they copied them or made them up. Tom's has slightly higher scores at 10253 @ stock. That's a 4% variance, probably due to tom's using DDR4 3600 with tuned timings while anandtech used DDR4 3200. It's only with a 5Ghz OC toms can even break through 11000, let alone score 11300.
    https://www.tomshardware.com/reviews/intel-core-i7...

Log in

Don't have an account? Sign up now