AMD 7th Gen Bristol Ridge and AM4 Analysis: Up to A12-9800, B350/A320 Chipset, OEMs first, PIBs Later
by Ian Cutress on September 23, 2016 9:00 AM ESTOver the last two weeks, AMD officially launched their 7th Generation Bristol Ridge processors as well as the new AM4 socket and related chipsets. The launch was somewhat muted, as the target for the initial launch is purely to the big system OEMs and system integrators, such as Lenovo, HP, Dell and others – for users wanting to build their own systems, ‘Product-in-Box’ units (called PIBs) for self-build systems will come at the end of the year. We held off on the announcement because the launch and briefings left a number of questions unanswered as to the potential matrix of configurations, specifications of the hardware and how it all connects together. We got a number of answers, so let’s delve in.
The CPUs
The seven APUs and one CPU being launched for OEM systems spans from a high-frequency A12 part using the 7th Generation microarchitecture (we call it Excavator v2) to the A6, and they all build on the Bristol Ridge notebook parts that were launched earlier in the year but focused on the desktop this time around. AMD essentially skipped the 6th Gen, Carrizo, for desktop as the design was significantly mobile focused – we ended up with one CPU, the Athlon X4 845 (which we reviewed), with DDR3 support but no integrated graphics. Using the updated 28nm process from TSMC, AMD was able to tweak the microarchitecture and allow full on APUs for desktops using a similar design.
The full list of processors is as follows:
AMD 7th Generation Bristol Ridge Processors | |||||
Modules/ Threads |
CPU Base / Turbo (MHz) |
GPU | GPU Base / Turbo (MHz) |
TDP | |
A12-9800 | 2M / 4T | 3800 / 4200 | Radeon R7 | 800 / 1108 | 65W |
A12-9800E | 2M / 4T | 3100 / 3800 | Radeon R7 | 655 / 900 | 35W |
A10-9700 | 2M / 4T | 3500 / 3800 | Radeon R7 | 720 / 1029 | 65W |
A10-9700E | 2M / 4T | 3000 / 3500 | Radeon R7 | 600 / 847 | 35W |
A8-9600 | 2M / 4T | 3100 / 3400 | Radeon R7 | 655 / 900 | 65W |
A6-9500 | 1M / 2T | 3500 / 3800 | Radeon R5 | 720 / 1029 | 65W |
A6-9500E | 1M / 2T | 3000 / 3400 | Radeon R5 | 576 / 800 | 35W |
Athlon X4 950 | 2M / 4T | 3500 / 3800 | - | - | 65W |
AMD’s mainstream processors will now hit a maximum of 65W in their official thermal design power (TDP), with the launch offering a number of 65W and 35W parts. There is the potential to offer CPUs with a configurable TDP, however much like the older parts that supported 65W/45W modes, it was seldom used, and chances are we will see OEMs stick with the default design power windows here. Also, the naming scheme: any 35W part now has an ‘E’ at the end of the processor name, allowing for easier identification.
As part of this review, we were able to snag a few extra configuration specifications for each of the processors, including the number of streaming processors in each, base GPU frequencies, base Northbridge frequencies (more on the NB later), and confirmation that all the APUs launched will support DDR4-2400 at JEDEC sub-timings.
AMD 7th Generation 65W Bristol Ridge Processors | |||||
Modules/ Threads |
CPU Base / Turbo (MHz) |
GPU SPs |
GPU Base / Turbo (MHz) |
Northbridge Base (MHz) |
|
A12-9800 | 2M / 4T | 3800 / 4200 | 512 | 800 / 1108 | 1400 |
A10-9700 | 2M / 4T | 3500 / 3800 | 384 | 720 / 1029 | 1400 |
A8-9600 | 2M / 4T | 3100 / 3400 | 384 | 655 / 900 | 1300 |
A6-9500 | 1M / 2T | 3500 / 3800 | 384 | 720 / 1029 | 1400 |
Athlon X4 950 | 2M / 4T | 3500 / 3800 | - | - | 1400 |
AMD 7th Generation 35W Bristol Ridge Processors | |||||
Modules/ Threads |
CPU Base / Turbo (MHz) |
GPU SPs |
GPU Base / Turbo (MHz) |
Northbridge Base (MHz) |
|
A12-9800E | 2M / 4T | 3100 / 3800 | 512 | 655 / 900 | 1300 |
A10-9700E | 2M / 4T | 3000 / 3500 | 384 | 600 / 847 | 1300 |
A6-9500E | 1M / 2T | 3000 / 3400 | 256 | 576 / 800 | 1300 |
The A12-9800 at the top of the stack is an interesting part on paper. If we do a direct comparison with the previous high-end AMD APUs, the A10-7890K, A10-7870K and A10-7860K, a lot of positives end up on the side of the A12.
High-End AMD APU Comparison | |||||||
A12-9800 | A10-7890K | A10-7870K | A10-7860K | A10-9700 | |||
MSRP | - | $165 | $137 | $117 | - | ||
Platform | Bristol Ridge | Kaveri Refresh | Bristol Ridge | ||||
uArch | Excavator v2 | Steamroller | Steamroller | Steamroller | Excavator v2 | ||
Threads | 2M / 4T | 2M / 4T | 2M / 4T | 2M / 4T | 2M / 4T | ||
CPU Base Freq | 3800 | 4100 | 3900 | 3600 | 3500 | ||
CPU Turbo Freq | 4200 | 4300 | 4100 | 4000 | 3800 | ||
IGP SPs | 512 | 512 | 512 | 512 | 384 | ||
GPU Turbo Freq | 1108 | 866 | 866 | 757 | 1029 | ||
TDP | 65W | 95W | 95W | 65W | 65W | ||
L1-I Cache | 192 KB | 192 KB | 192 KB | 192 KB | 192 KB | ||
L1-D Cache | 128 KB | 64 KB | 64 KB | 64 KB | 128 KB | ||
L2 Cache | 2 MB | 4 MB | 4 MB | 4 MB | 2 MB | ||
DDR Support | DDR4-2400 | DDR3-2133 | DDR3-2133 | DDR3-2133 | DDR4-2400 | ||
PCIe 3.0 | x8 | x16 | x16 | x16 | x8 | ||
Chipsets | B350 A320 X/B/A300 |
A88X A78 A68H |
A88X A78 A68H |
A88X A78 A68H |
B350 A320 X/B/A300 |
The frequency of the A12-9800 gives it a greater dynamic range than the A10-7870K (having 3.8-4.2 GHz, rather than 3.9-4.1), but with the newer Excavator v2 microarchitecture, improved L1 cache, AVX 2.0 support and a much higher integrated graphics frequency (1108 MHz vs. 866 MHz) while also coming in at 30W less TDP. The 30W TDP jump is the most surprising – we’re essentially getting better than the previous A10-class performance at a lower power, which is most likely why they started naming the best APU in the stack an ‘A12’. Basically, the A12-9800 APU will be an extremely interesting one to review given the smaller L2 cache but faster graphics and DDR4 memory.
A Wild Overclocker Appears!
Given that technically the systems with the new APUs have been released for a couple of weeks, some vendors have their internal enthusiasts play around with the platform. Bearing in mind that AMD has not announced any formal overclocking support on these new APUs, NAMEGT, a South Korean overclocker with ties to ASUS, has pushed the A12-9800 APU to 4.8 GHz by adjusting the multiplier. To do this, he used an unreleased ASUS Octopus AM4 motherboard and AMD’s 125W Wraith air cooler (which will presumably be bundled with PIBs later in the product cycle).
NAMEGT ran this setup on multithreaded Cinebench 11.5 and Cinebench 15, scoring 4.77 and 380 respectively for a 4.8 GHz overclock. If we compare this to our Bench database results, we see the following
For Cinebench 15, this overclocked score puts the A12-9800 above the Haswell Core i3-4360 and the older AMD FX-4350, but below the newer Skylake i3-6100TE. The Athlon X4 845 at stock frequencies scored 314 while running at 3.5 GHz, which would suggest that a stock A12-9800 at 3.8 GHz would fall around the 340 mark.
(Since writing this, a preview by Korean website Bodnara, using the A12-9800 in a GIGABYTE motherboard, scored 334 for a stock Cinebench 15 multithreaded test and 96 for the single threaded test. We've added this result for perspective.)
When we previously tested the Excavator architecture for desktop on the 65W Athlon X4 845, overclocking was a nightmare, with stability being a large issue. At the time, we suspected that due to the core design being focused towards 15W, moving beyond 65W was perhaps a bit of a stretch for the design at hand. This time around, as we reported before, Bristol Ridge is using an updated 28nm process over Carrizo, which may have a hand in this.
When we asked AMD about overclocking details on the new APUs, the return reply was along the lines of ‘No OEM systems at this time will be unlocked, and no official comment on the individual units. More details will be released closer to the platform launch for DIY users’.
122 Comments
View All Comments
Alexvrb - Sunday, September 25, 2016 - link
Geekbench is trash at comparing across different architectures. It makes steaming piles look good. Only using SSE (first gen, ancient) on x86 processors would certainly be a part of the puzzle regarding Geekbench results. Thanks, Patrick.Not to take anything away from Apple's cores. I wouldn't be surprised that they have better performance per WATT than Skylake. Perf/watt is kind of a big deal for mobile, and Apple (though I don't care for them as a company) builds very efficient processor cores. With A10 using a big.LITTLE implementation of some variety, they stand to gain even more efficiency. But in terms of raw performance? Never rely on Geekbench unless maybe you're comparing an A9 Apple chip to an A10 or something. MAYBE.
ddriver - Monday, September 26, 2016 - link
Hey, it is not me who uses crap like geekbench and sunspider to measure performnace, it is sites like AT ;)BurntMyBacon - Monday, September 26, 2016 - link
@ddriver: "Hey, it is not me who uses crap like geekbench and sunspider to measure performnace, it is sites like AT ;)"LOL. My gut reaction was to call you out on blame shifting until I realized ... You are correct. There hasn't exactly been a lot of benchmark comparison between ARM and x86. Of course, there isn't much out there with which to compare either so ...
patrickjp93 - Monday, September 26, 2016 - link
Linpack and SAP. Both are massive benchmark suites that will give you the honest to God truth, and the truth is ARM is still 10 years behind.patrickjp93 - Monday, September 26, 2016 - link
They use it in context and admit the benchmarks are not equally optimized across architectures.patrickjp93 - Monday, September 26, 2016 - link
It doesn't even use SSE. It uses x86_64 and x87 scalar float instructions. It doesn't even give you MMX or SSE. That's how biased it is.patrickjp93 - Monday, September 26, 2016 - link
Just because you write code simply enough using good modern form and properly align your data and make functions and loops small enough to be easily optimized does not mean GCC doesn't choke. Mike Acton gave a great lecture at CPPCon 2014 showing various examples where GCC, Clang, and MVCC choke.Define very good.
Define detailed analysis. Under what workloads? Is it more efficient for throughput or latency (because I guarantee it can't be both)?
Yes, Geekbench uses purely scalar code on x86 platforms. It's ludicrously pathetic.
It's 8x over scalar, and that's where it matters, and it can even be better than that because of loop Muop decreases which allow the loops to fit into the detector buffers which can erase the prefetch and WB stages until the end of the loop.
No, they're not more powerful. A Pentium IV is still more powerful than the Helio X35 or Exynos 8890.
No, those are select benchmarks that are more network bound than CPU bound and are meaningless for the claims people are trying to make based on them.
BurntMyBacon - Monday, September 26, 2016 - link
@ddriver: "I've been using GCC mostly, and in most of the cases after doing explicit vectorization I found no perf benefits, analyzing assembly afterwards revealed that the compiled has done a very good job at vectorizing wherever possible."It's not just about vectorizing. I haven't taken a look at Geekbench code, but it is pretty easy to under-utilize processor resources. Designing workloads to fit within a processors cache for repetitive operations is a common way to optimize. It does, however, leave a processor with a larger cache underutilized for the purposes of the workload. Similar examples can be found for wide vs narrow architectures and memory architectures feeding the processor. Even practical workloads can be done various ways that are much more or less suitable to a given platform. Compression / Encoding methods are some examples here.
BurntMyBacon - Monday, September 26, 2016 - link
@patrickjp93: "Yes you can get 5x the performance by optimizing. Geekbench only handles 1 datem at a time on Intel hardware vs. the 8 you can do with AVX and AVX2. Assuming you don't choke on bandwidth, you can get an 8x speedup."If you have processor with a large enough cache to keep a workload almost entirely in cache and another with far less cache that has to access main memory repetitively to do the job, the difference can be an order of magnitude or more. Admittedly, the type of workload that is small enough to fit in any processor cache isn't common, but I've seen cases of it in benchmarks and (less commonly in) scientific applications.
patrickjp93 - Tuesday, September 27, 2016 - link
Heh, they're usually based on Monte Carlo simulations if they can.