A Closer Look at Latency and Scaling
As was explained in the Core 2 Duo launch review, Core 2 Duo has not physically added a memory controller on the processor. The memory controller is still part of the motherboard chipset that drives Core 2 Duo. Intel added features that perform intelligent look-aheads on the memory controller to behave like lower latency. As you saw on pages 2 and 3, ScienceMark 2.0 shows the "intelligent look-aheads" in Core 2 Duo to be extremely effective, with Core 2 Duo memory now exhibiting lower apparent latency than AM2. However, not all latency benchmarks show the same results. Everest from Lavalys shows latency improvements in the new CPU revisions, but it shows Latency more as we would expect in evaluating Conroe. For that reason, our detailed benchmarks for latency will use both Everest 1.51.195, which fully supports the Core 2 Duo processor, and ScienceMark 2.0.
Latency, or how fast memory is accessed, is not a static measurement. It varies with memory speed and generally improves (goes down) as memory speed increases. To better understand what is happening with memory accesses we first looked at Sciencemark 2.0 Latency on both AM2 and Conroe.
ScienceMark shows Conroe Latency with a 45ns to 61ns lead at DDR2-400. Latency continues to decrease as memory speed increases with Core 2 Duo, reaching a value of about 30ns at DDR2-1067. The Trend line for AM2 is steeper than Core 2 Duo, increasing at a rapid rate until latency is virtually the same at DDR2-800.
It is very interesting that ScienceMark shows lower latency on Core 2 Duo than AM2, since we all know the on-chip AM2 controller has to be faster. We thought perhaps it was because all of the tested memory accesses could be contained in the shared 4MB cache of Core 2 Duo, but Alex Goodrich,one of the authors of ScienceMark, states that Version 2 is designed to test up to 16MB of memory, forseeing the day of larger caches. In addition he states the Core 2 duo prefetcher is clever enough to pick up all the patterns ScienceMark uses to "fool" hardware prefetchers. ScienceMark plans a revision with an algoritm that is harder to fool, but Alex commented that Conroe fooling their benchmark was "in itself a great indicator of performance".
Everest uses a different algorithm for measuring Latency, and it shows the on-chip AM2 DDR2 controller in the lead at all memory speeds, with Latency almost the same at the Core 2 Duo memory speed range of DDR2-400 to DDR2-533. However, the Everest trend lines are similar to those in ScienceMark, in that AM2 latency improves at a steeper rate than Core 2 Duo as memory speed increases.
The point to the Latency discussion is that, as expected, AMD has much more opportunity for performance improvement with memory speed increases in AM2. Intel will eventually reach the point, if the lines were extended, where they would have to move to an on-chip memory controller to further improve latency. This is not to take anything away from Intel's intelligent design on Core 2 Duo. They have found a solution that fixes a performance issue without requiring an on-chip controller - for now.
As was explained in the Core 2 Duo launch review, Core 2 Duo has not physically added a memory controller on the processor. The memory controller is still part of the motherboard chipset that drives Core 2 Duo. Intel added features that perform intelligent look-aheads on the memory controller to behave like lower latency. As you saw on pages 2 and 3, ScienceMark 2.0 shows the "intelligent look-aheads" in Core 2 Duo to be extremely effective, with Core 2 Duo memory now exhibiting lower apparent latency than AM2. However, not all latency benchmarks show the same results. Everest from Lavalys shows latency improvements in the new CPU revisions, but it shows Latency more as we would expect in evaluating Conroe. For that reason, our detailed benchmarks for latency will use both Everest 1.51.195, which fully supports the Core 2 Duo processor, and ScienceMark 2.0.
Latency, or how fast memory is accessed, is not a static measurement. It varies with memory speed and generally improves (goes down) as memory speed increases. To better understand what is happening with memory accesses we first looked at Sciencemark 2.0 Latency on both AM2 and Conroe.
ScienceMark shows Conroe Latency with a 45ns to 61ns lead at DDR2-400. Latency continues to decrease as memory speed increases with Core 2 Duo, reaching a value of about 30ns at DDR2-1067. The Trend line for AM2 is steeper than Core 2 Duo, increasing at a rapid rate until latency is virtually the same at DDR2-800.
It is very interesting that ScienceMark shows lower latency on Core 2 Duo than AM2, since we all know the on-chip AM2 controller has to be faster. We thought perhaps it was because all of the tested memory accesses could be contained in the shared 4MB cache of Core 2 Duo, but Alex Goodrich,one of the authors of ScienceMark, states that Version 2 is designed to test up to 16MB of memory, forseeing the day of larger caches. In addition he states the Core 2 duo prefetcher is clever enough to pick up all the patterns ScienceMark uses to "fool" hardware prefetchers. ScienceMark plans a revision with an algoritm that is harder to fool, but Alex commented that Conroe fooling their benchmark was "in itself a great indicator of performance".
Everest uses a different algorithm for measuring Latency, and it shows the on-chip AM2 DDR2 controller in the lead at all memory speeds, with Latency almost the same at the Core 2 Duo memory speed range of DDR2-400 to DDR2-533. However, the Everest trend lines are similar to those in ScienceMark, in that AM2 latency improves at a steeper rate than Core 2 Duo as memory speed increases.
The point to the Latency discussion is that, as expected, AMD has much more opportunity for performance improvement with memory speed increases in AM2. Intel will eventually reach the point, if the lines were extended, where they would have to move to an on-chip memory controller to further improve latency. This is not to take anything away from Intel's intelligent design on Core 2 Duo. They have found a solution that fixes a performance issue without requiring an on-chip controller - for now.
118 Comments
View All Comments
Calin - Tuesday, July 25, 2006 - link
I hate to rain on your parade, but the E6x00 and X6800 (Extreme) desktop CPU won't see a dual socket mainboard - for that you must use Xeons.As for multisocket, it was a niche market when multicore was not available, it is (maybe even more so) when multicore is available. Quad core will reduce it even more for desktop use. As for Intel knowing multicore is the future, I think their quad core will be on market before AMD's quad core - and if you are worrying about performance, keep worrying - we can small talk about this and that all day long.
AMD is in a much weaker position now - they must sell processors at half (or less) the profit they sold them until now, and the future is grim if you regard their profits. They could survive a long way, but they again are the budget CPUs, the best choice for small money.
As for 64-bit, you are certainly right - just that right now, 64-bit is of little use on desktop, the operating systems suffer from drivers problems, 64-bit applications are few and far between. You might need 64-bit and profit from it, but you are a minority now.
Ingas - Tuesday, July 25, 2006 - link
Maybe AMD in trouble.But not because of Core 2 Duo, but because of Woodcrest.
AMD alwais said that only server processors giving profit.
So ...
With Dell's AMD Now - maybe it's not trouble for AMD at all.
Calin - Wednesday, July 26, 2006 - link
Dell will only build enough AMD gear as not to lose business with their customers that WANT AMD gear. Even with higher performance losses on 4 sockets, Xeons Core2Duo (which are faster to boot) might put a fair fight against AMD - and then customers will choose based on other things than performance.I agree AMD Opteron scale better - but they start scaling from a lower performance
duploxxx - Wednesday, July 26, 2006 - link
do you really think you're sure about that. compare the same speed of opteron vs woodcrest and you will talk different.. i know how it performs because i have a wood es system on my table. and i am not a big fan of hexus reviews but look at the site, the wood isn't so bright and shining knowing again it is a compare of 3.0 vs 2.6.mesyn191 - Tuesday, July 25, 2006 - link
For 2S systems Intel will have the lead til' K8L becomes available, but for 4S AMD will have Intel beat and that lead will only increase when K8L becomes available. They're definitly gonna be hurting profit wise, but they'll be doing better than they were when it was P4C vs. AXP and they got through that so I see no reason to worry about them going into bankruptcy before K8L comes out in volume.sld - Tuesday, July 25, 2006 - link
What is wrong with a desktop user looking at the performance of a desktop cpu?When you can get a 4x4 at the same price/performance ratio curve as a Core 2 Duo, do please inform me.
I still believe AMD vs Intel is a David vs Goliath, although like the real David, AMD is beginning to get complacent with just a taste of power, and Core 2 is just what it needs to wake up and start dropping prices. :)
sld - Tuesday, July 25, 2006 - link
I forgot to mention that Core 2 is worth a consideration over the K8, but if we really want to punish Intel for being the monster they are, we should institute a complete boycott over the purchase of their existing Netburst inventory. That should hurt them quite a bit...Picture a scenario where new chips go straight out of the warehouse and into the embracing arm of a bulldozer. When it comes to that point do you think they will resort to giving the cpus away?
mattsaccount - Tuesday, July 25, 2006 - link
Are the Super Pi scores on page 7 right? The text says Conroe wins everything, but the Super Pi bench is reversed (I'm guessing the colors are just backward)Wesley Fink - Tuesday, July 25, 2006 - link
Since the lower score is better on Super Pi (faster time) the scales are reversed - from zero at the top to 90 at the bottom. The colors and values are correct, just upside down so the lowest score (fastest) is on the top like the other charts. You apparently caught that while I was typing this explanation :)highlandsun - Tuesday, July 25, 2006 - link
Have you got 32M digit results for Super Pi? Curious to see if that will exceed Conroe's cache and therefore reflect the real memory bandwidth. Also, results for running two copies of Super Pi at once on each system.