Some of you may remember AMD announcing the "Torrenza" technology 10 years ago. The idea was to offer a fast and coherent interface between the CPU and various types of "accelerators" (via Hyper Transport). It was one of the first initiatives to enable "heterogeneous computing".

We now have technology that could be labeled "heterogeneous computing", the most popular form being GPU computing. There have been also encryption, compression and network accelerators, but the advantages of those accelerators were never really clear, as shifting data back and forth to the CPU was in many cases less efficient than letting the CPU process it with optimized instructions. Heterogeneous computing was in the professional world mostly limited to HPC; in the consumer world a "nice to have".

But times are changing. The sensors of the Internet of Things, the semantic web and the good old www are creating a massive and exponentially growing flood of data that can not be stored and analyzed by traditional means. Machine learning offers a way of classifying all that data and finding patterns "automatically". As a result, we witnessed a "machine learning renaissance", with quite a few breakthroughs. Google had to deal with this years ago before most other companies, and released some of those AI breakthroughs of the Google Brain Team in the Open Source world, one example being "TensorFlow". And when Google releases important technology into the Open Source world, we know we got to pay attention. When Google released the Google File System and Big Table back in 2004 for example, a little bit later the big data revolution with Hadoop, HDFS and NoSQL databases erupted.

Big Data thus needs big brains: we need more processing power than ever. As Moore's law is dead (the end of CMOS scaling), we can not expect much from process technology advancements. The processing power has to come from ASICs (see Google's TPU), FPGAs (see Microsoft's project Catapult) and GPUs.

Those accelerators need a new "Torrenza technology", a fast, coherent interconnect to the CPU. NVIDIA was first with NVLink, but an open standard would be even better. IBM on the other hand was willing to share the CAPI interface.

To that end, Google, AMD, Xilinx, Micron and Mellanox have joined forces with IBM to create a "coherent high performance bus interface" based on a new bus standard called "Open Coherent Accelerator Processor Interface" (OpenCAPI). Capable of a 25Gbits per second per lane data rate, OpenCAPI outperforms the current PCIe specification, which offers a maximum data transfer rate of 8Gbits per second for a PCIe 3.0 lane. We assume that the total bandwidth will be a lot higher for quite a few OpenCAPI devices, as OpenCAPI lanes will be bundled together.

It is a win, win for everybody besides Intel. It is clear now that IBM's OpenPOWER initiative is gaining a lot of traction and that IBM is deadly serious about offering an alternative to the Intel dominated datacenter. IBM will implement the OpenCAPI interface in the POWER9 servers in 2017. Those POWER9s will not only have a very fast interface to NVIDIA GPUs (via NVLink), but also to Google's ASICs and Xilinx FPGAs accelerators.

Meanwhile this benefits AMD as they get access to an NVLink alternative to link up the Radeon GPU power to the upcoming Zen based server processors. Micron can link faster (and more profitable than DRAM) memory to the CPU. Mellanox can do the same for networking. OpenCAPI is even more important for the Xilinx FPGAs as a coherent interface can make FPGAs attractive for a much wider range of applications than today.

And guess what, Dell/EMC has joined this new alliance just a few days ago. Intel has to come up with an answer...

Update: courtesy of commenter Yojimbo: "NVIDIA is a member of the OpenCAPI consortium, at the "contributor level", which is the same level Xilinx has. The same is true for HPE (HP Enterprise)".

This is even bigger than we thought. Probably the biggest announcement in the server market this year.

 

Source: OpenCAPI

Comments Locked

49 Comments

View All Comments

  • OEMG - Friday, October 14, 2016 - link

    I guess Intel's fine with good ol' PCIe. It's actually crazy how many players are coming up with their own interconnects.
  • patrickjp93 - Friday, October 14, 2016 - link

    Omnipath and Omniscale.
  • johnpombrio - Friday, October 14, 2016 - link

    Funny thing is that IBM is also in bed with NVidia using the NVLink and the P100 GPUs with a "5X faster than PCI" bus. So which bus is IBM pushing here? Intel completely owns the server market so exactly who is this for? The China server market?
  • SarahKerrigan - Saturday, October 15, 2016 - link

    Not quite. Non-x86 server sales account for a little under 15% of server revenue total, and IBM makes up most of that. That's several billion dollars per year worth of hardware.
  • iwod - Saturday, October 15, 2016 - link

    This is a lot higher then I thought. Where is that data from? Considering x86 has 95%+ of server unit shipped, which means the 5% shipment of non-x86 represent 15% revenue.
  • SarahKerrigan - Saturday, October 15, 2016 - link

    x86 is actually over 99% of units shipped. Non-x86 systems tend to be mainframes and commercial UNIX systems with a very high per-unit price (a single mainframe can run into the millions of dollars.) So, a much higher percentage of revenue than volume.

    IDC and Gartner both have data supporting this.
  • lefty2 - Friday, October 14, 2016 - link

    So.. basically, it's a faster bus than PCIe. Why put all that bullshit at the start of the article? This is nothing to do with machine learning or "heterogeneous computing".
  • diehardmacfan - Friday, October 14, 2016 - link

    The need for faster interconnects absolutely has to do with heterogeneous computing. This standard isn't for your average gaming GPU.
  • lefty2 - Friday, October 14, 2016 - link

    No, it's not. It can be used in any application that needs higher bus speed. Also, it could be used for high-end gaming GPU's (why not?)
  • Yojimbo - Friday, October 14, 2016 - link

    High-end gaming GPUs don't need higher bus speeds. Besides this is targeted towards data center servers. It is made with heterogeneous computing in mind. Even without considering the technical decisions made and how they affect various use cases, this consortium doesn't include Intel so it's going to have to exist in a non-Intel ecosystem. You're not likely to see CAPI in consumer hardware any time soon.

Log in

Don't have an account? Sign up now