Tonga’s Microarchitecture - What We’re Calling GCN 1.2

As we alluded to in our introduction, Tonga brings with it the next revision of AMD’s GCN architecture. This is the second such revision to the architecture, the last revision (GCN 1.1) being rolled out in March of 2013 with the launch of the Bonaire based Radeon HD 7790. In the case of Bonaire AMD chose to kept the details of GCN 1.1 close to them, only finally going in-depth for the launch of the high-end Hawaii GPU later in the year. The launch of GCN 1.2 on the other hand is going to see AMD meeting enthusiasts half-way: we aren’t getting Hawaii level details on the architectural changes, but we are getting an itemized list of the new features (or at least features AMD is willing to talk about) along with a short description of what each feature does. Consequently Tonga may be a lateral product from a performance standpoint, but it is going to be very important to AMD’s future.

But before we begin, we do want to quickly remind everyone that the GCN 1.2 name, like GCN 1.1 before it, is unofficial. AMD does not publicly name these microarchitectures outside of development, preferring to instead treat the entire Radeon 200 series as relatively homogenous and calling out feature differences where it makes sense. In lieu of an official name and based on the iterative nature of these enhancements, we’re going to use GCN 1.2 to summarize the feature set.

AMD's 2012 APU Feature Roadmap. AKA: A Brief Guide To GCN

To kick things off we’ll pull old this old chestnut one last time: AMD’s HSA feature roadmap from their 2012 financial analysts’ day. Given HSA’s tight dependence on GPUs, this roadmap has offered a useful high level overview of some of the features each successive generation of AMD GPU architectures will bring with it, and with the launch of the GCN 1.2 architecture we have finally reached what we believe is the last step in AMD’s roadmap: System Integration.

It’s no surprise then that one of the first things we find on AMD’s list of features for the GCN 1.2 instruction set is “improved compute task scheduling”. One of AMD’s major goals for their post-Kavari APU was to improve the performance of HSA by various forms of overhead reduction, including faster context switching (something GPUs have always been poor at) and even GPU pre-emption. All of this would fit under the umbrella of “improved compute task scheduling” in AMD’s roadmap, though to be clear with AMD meeting us half-way on the architecture side means that they aren’t getting this detailed this soon.

Meanwhile GCN 1.2’s other instruction set improvements are quite interesting. The description of 16-bit FP and Integer operations is actually very descriptive, and includes a very important keyword: low power. Briefly, PC GPUs have been centered around 32-bit mathematical operations for some number of years now since desktop technology and transistor density eliminated the need for 16-bit/24-bit partial precision operations. All things considered, 32-bit operations are preferred from a quality standpoint as they are accurate enough for many compute tasks and virtually all graphics tasks, which is why PC GPUs were limited to (or at least optimized for) partial precision operations for only a relatively short period of time.

However 16-bit operations are still alive and well on the SoC (mobile) side. SoC GPUs are in many ways a 5-10 year old echo of PC GPUs in features and performance, while in other ways they’re outright unique. In the case of SoC GPUs there are extreme sensitivities to power consumption in a way that PCs have never been so sensitive, so while SoC GPUs can use 32-bit operations, they will in some circumstances favor 16-bit operations for power efficiency purposes. Despite the accuracy limitations of a lower precision, if a developer knows they don’t need the greater accuracy then falling back to 16-bit means saving power and depending on the architecture also improving performance if multiple 16-bit operations can be scheduled alongside each other.

Imagination's PowerVR Series 6XT: An Example of An SoC GPU With FP16 Hardware

To that end, the fact that AMD is taking the time to focus on 16-bit operations within the GCN instruction set is an interesting one, but not an unexpected one. If AMD were to develop SoC-class processors and wanted to use their own GPUs, then natively supporting 16-bit operations would be a logical addition to the instruction set for such a product. The power savings would be helpful for getting GCN into the even smaller form factor, and with so many other GPUs supporting special 16-bit execution modes it would help to make GCN competitive with those other products.

Finally, data parallel instructions are the feature we have the least knowledge about. SIMDs can already be described as data parallel – it’s 1 instruction operating on multiple data elements in parallel – but obviously AMD intends to go past that. Our best guess would be that AMD has a manner and need to have 2 SIMD lanes operate on the same piece of data. Though why they would want to do this and what the benefits may be are not clear at this time.

AMD's Radeon R9 285 GCN 1.2: Geometry Performance & Color Compression
Comments Locked


View All Comments

  • TiGr1982 - Thursday, September 11, 2014 - link

    BTW, is Tonga the only new GPU AMD has to offer in 2014?
    (if I'm not mistaken, the previous one from AMD, Hawaii, was released back in October 2013, almost a year ago)
    Does anybody know?
  • HisDivineOrder - Thursday, September 11, 2014 - link

    The thing is the moment I heard AMD explaining how Tonga was too new for current Mantle applications, I was like, "And there the other shoe is dropping."

    The promise of low level API is that you get low level access and the developer gets more of the burden of carrying the optimizations for the game instead of a driver team. This is great for the initial release of the game and great for the company that wants to have less of a (or no) driver team, but it's not so great for the end user who is going to wind up getting new cards and needing that Mantle version to work properly on games no longer supported by their developer.

    It's hard enough getting publishers and/or developers to work on a game a year or more after release to fix bugs that creep in and in some cases hard to get them to bother with resolution switches, aspect ratio switches, the option to turn off FXAA, the option to choose a software-based AA of your choice, or any of a thousand more doohickeys we should have by now as bog-standard. Can you imagine now relying on that developer--many of whom go completely out of business after finishing said title if they happen to work for Activision or EA--to fix all the problems?

    This is why a driver team is better working on it. Even though the driver team may be somewhat removed from the development of the game, the driver team continues to have an incentive to want to fix that game going forward, even if it's a game no longer under development at the publisher. You're going to be hard pressed to convince Bobby Kotick at Activision that it's worth it to keep updating versions of games older than six months (or a year for Call of Duty) because at a certain point, they WANT you to move on to another game. But nVidia and AMD (and I guess Intel?) want to make that game run well on next gen cards to help you move.

    This is where Mantle is flawed and where Mantle will never recover. Every time they change GCN, it's going to wind up with a similar problem. And every time they'll wind up saying, "Just switch to the DX version." If Mantle cannot be relied upon for the future, then it is Glide 2.0.

    And why even bother at all? Just stick with DirectX from the get-go, optimize for it (as nVidia has shown there is plenty of room for improvement), and stop wasting any money at all on Mantle since it's a temporary version that'll rapidly be out of date and unusable on future hardware.
  • The-Sponge - Thursday, September 11, 2014 - link

    I do not understand how they got there R9 270x temperatures, my OC'd R9 270x never even comes close to the temps they got....
  • mac2j - Friday, September 12, 2014 - link

    It's great that they've caught up with H.264 on hardware and the card otherwise looks fine. The bottom line for me, though, is that I don't see the point of buying card now without H.265 on hardware and an HDMI 2.0 port - 2 things Maxwell will bring this year. I haven't heard what AMDs timetable is there though.
  • P39Airacobra - Friday, October 17, 2014 - link

    It really irritates me that they are making these cards throttle to keep power and temps down! That is pathetic! If you can't make the thing right just don't make it! Even if it throttles .1mhz it should not be tolerated! We pay good money for this stuff and we should get what we pay for! It looks like the only AMD cards worth anything are the 270's and under. It stinks you have to go Nvidia to get more power! Because Nvidia really rapes people with their prices! But I must say the GTX 970 is priced great if it is still around $320. But AMD should have never even tried with this R9 285! First of all when you pay that much you should get more than 2GB. And another thing the card is pretty much limited to the performance of the R9 270's because of the V-Ram count! Yeah the 285 has more power than the 270's, But whats the point when you do not have enough V-Ram to take the extra power were you need a card like that to be? In other words if you are limited to 1080p anyway, Why pay the extra money when a R7 265 will handle anything at 1080p beautifully? This R9 285 is a pointless product! It is like buying a rusted out Ford Pinto with a V-8 engine! Yeah the engine is nice! But the car is a pos!
  • P39Airacobra - Friday, January 9, 2015 - link

    (QUOTE) So a 2GB card is somewhat behind the times as far as cutting edge RAM goes, but it also means that such a card only has ¼ of the RAM capacity of the current-gen consoles, which is a potential problem for playing console ports on the PC (at least without sacrificing asset quality).

    (SIGH) So now even reviewers are pretending the consoles can outperform a mid range GPU! WOW! How about telling the truth like you did before you got paid off! The only reason a mid range card has problems with console ports is because they are no longer optimized! They just basically make it run on PC and say xxxx you customers here it is! And no the 8GB on the consoles are used for everything not for only V-Ram! We are not stupid idiots that fall for anything like the idiots in Germany back in the 1930's!

Log in

Don't have an account? Sign up now