Comments for NVIDIA Announces CUDA 6: Unified Memory for CUDA

NVIDIA Announces CUDA 6: Unified Memory for CUDA

by Ryan Smith on 11/14/2013 9:00 AM EST

Posted in
CUDA
GPUs
NVIDIA
Compute
Maxwell

Post Your Comment
Please log in or sign up to comment.

Comments Locked

43 Comments

Back to Article

tipoo - Thursday, November 14, 2013 - link
So this is just for developer simplicity, it doesn't really have the benefits of true unified memory and will still need to swap over PCI-E, which I've heard can take so much time that many GPGPU operations are rendered useless as it would take less time to do traditionally. Unified memory is really the way to go in the future, if only system memory could feed beefier GPUs.
tipoo - Thursday, November 14, 2013 - link
Maybe that's the point of Volta with stacked DRAM.
Nenad - Friday, November 15, 2013 - link
I think Volta would only change where physically DRAM is, but Maxwell already should support single (same) DRAM for both CPU and GPU.

Anyway, I see big advantage of CUDA6 in allowing you to write program today that will use advantages of unified memory tomorrow, by simply skipping behind-the-scene copy when it detect real unified memory.
For example my current CUDA apps, where I manually copy to/from GPU mem, will work sub-optimally on future Maxwell GPU with unified memory, since it will uselessly copy data from DRAM to that same DRAM. And while Nvidia can *try* to make CUDA CPU2GPU procedures 'smart' on future unified mem GPUs (to skip copy even with explicit copy command), that wont work every time, since sometimes my app rely on fact that GPU will work on its own copy while I can use original 'CPU side' data for other stuff. And its doubtful compiler can 100% detect such cases. On the other hand, with CUDA6, there is no doubt optimization can be made to skip copy on unified system.
Flunk - Thursday, November 14, 2013 - link

It doesn't really have the benefits of true unified memory and will still need to swap over PCI-E

How would you expect the data to get from system RAM to the GPU's RAM without PCI-E? "true" unified memory requires actually unifying the memory architecture, such as is done with AMD Fusion APUs. You can't change hardware details with a software API.
andrewaggb - Thursday, November 14, 2013 - link
well it should allow for code written now to run faster if future hardware does have unified memory. The copies might not even be necessary and this can be handled behind the scenes. At least that's what I get out of it.

But I agree that AMD's approach with apu's and unified memory seems like where the big gpu compute gains will come from. At least it should be usable for all the scenerios where pci overhead are significant, though faster external gpu's might still be better for some tasks.

Should be interesting.
extide - Thursday, November 14, 2013 - link
With a discrete GPU that has different physical memory than the CPU there will ALWAYS be a need to copy the memory back and forth. Whether it's done behind the scenes or not really doesnt matter in the performance sense. This is really a non-event in my book, I mean sure it's cool to have some things done automatically but the performance gains come from having physically the same memory, not the other way around.
Yojimbo - Friday, November 15, 2013 - link
I suppose you do all your coding in assembly
tipoo - Thursday, November 14, 2013 - link
Which was exactly my point.
Kevin G - Thursday, November 14, 2013 - link
nVidia is going the other other direction: it is building an ARM CPU into their future GPU's. See Project Denver. This way they can pawn off all the necessary CPU tasks as much as possible to a localized processor and side step the latency and bandwidth issues of PCI-e and the host CPU.
lours - Thursday, November 14, 2013 - link
Big marketing shot to try minimize the impact from AMD hUMA.
AMDshit - Thursday, November 14, 2013 - link
You mean AMD pUTA marketing vs reald world CUDA?
Spunjji - Thursday, November 14, 2013 - link
What a laugh you are! How'd you slip through the mods with a username like that? :D But seriously, this is fairly typical nVidia "no look at me" marketing. It usually works for them, so good luck to them. More tools for GPU programming is a good thing.
wwwcd - Thursday, November 14, 2013 - link
Previously stacked dram was announced like maxwell feature :D
AMDshit - Thursday, November 14, 2013 - link
Cool story, AMD lunatic.
ddriver - Thursday, November 14, 2013 - link
It will boost productivity, but not performance, under the table memory is still being copied around.

Maybe nvidia should scrap pimping their proprietary closed tech and contribute a bit to something open, portable and platform independent .. like OpenCL. The effort will be much more appreciated that those attempts to perpetuate the fragmentation.
AMDshit - Thursday, November 14, 2013 - link
So why Mantle isn't based on OpenCL?
DanNeely - Thursday, November 14, 2013 - link
Because mantle is a GPU rendering API not a compute API?
ddriver - Thursday, November 14, 2013 - link
Because mantle is a GRAPHICS API, and OpenCL is a COMPUTE API. But then again, looking at your screen name I sit and wonder why I even bother acknowledging your pitiful existence :)
AMDshit - Thursday, November 14, 2013 - link
OK. Why not OpenGL?
ddriver - Thursday, November 14, 2013 - link
So, according to your brilliant logic, AMD should use OpenGL to implement a low level alternative to OpenGL? And no, don't bother answering, it was a rhetorical question, if you don't know what that is, look it up.
Kevin G - Thursday, November 14, 2013 - link
OpenGL was designed in the 90's and it carries some architectural baggage from that era. The Khronos group has insisted on backwards compatibility for OpenGL as the API continues to evolve. Thus in order to trim the fat from the API and optimize for modern architectures, a clean break is necessary.
invinciblegod - Thursday, November 14, 2013 - link
Well, first your screenname sort of makes anything you say on the topic suspect. Second, people blast AMD for having mantle be proprietary also (like techreport). However, mantle has a performance advantage. Does cuda perform better than opencl?
ddriver - Thursday, November 14, 2013 - link
Last time I checked OpenCL performance was slightly better. CUDA is only faster to compile kernels.
AMDshit - Thursday, November 14, 2013 - link
On which AMD piad software? http://media2.hpcwire.com/hpcwire/CUDA_OpenCL_comp...
Spunjji - Thursday, November 14, 2013 - link
"Note that performance is, in most cases, close to equivalent".

I'm sorry, you were trying to prove what now? :D
ddriver - Thursday, November 14, 2013 - link
nvidia deliberately downplays opencl performance on their hardware to make cuda artificially more "attractive" - because cuda only works on nvidia and opencl works almost everywhere. opencl compute on radeons completely trashes nvidia gpus in the same price range, regardless if they use opencl or cuda

also, this chart actually shows cuda no better than opencl across the board, the few tests where cuda scored better is only because the opencl implementation was incomplete and parts of the workload ran in software rather than in hardware
Spunjji - Thursday, November 14, 2013 - link
I'm not sure how you felt that was relevant, but I'm hopeful that a mod will find and destroy your posts before too long.
ddriver - Thursday, November 14, 2013 - link
fat chance
Morawka - Thursday, November 14, 2013 - link
i wouldn't touch any open standard that apple has a hand in making/developing.
ddriver - Friday, November 15, 2013 - link
I too despise apple with a passion, but this is not proprietary, apple can only contribute to OpenCL and only with features the entire standard commission agrees with, apple do not have the power to spoil OpenCL in any way.

OpenCL works on pretty much every CPU with SIMD, on pretty much every desktop GPU, on recent modern and future ARM chips, even can be implemented in hardware on FPGAs.

There is no reason to dismiss OpenCL because of apple's involvement, surely, apple is a nasty corporation, but so is nvidia, they just were never in the position to be that insolent, but then again, look at the price of titan...

Fact is, apple's involvement or not, OpenCL is the best thing available, it is an open standard, it is not vendor limited and it is beyond the power of any corporation to spoil it.

So if you refuse to "touch" OpenCL, you have even more reasons to refuse to touch cuda, which means you willingly agree to miss out on the tremendous benefits of the power of GPUs, which I doubt you will. Adobe have already scrapped cuda for OpenCL, and due to its portability, vendor and platform independence, OpenCL is yet to gain more ground and come with every application that requires serious numbrer crunching.
Klimax - Saturday, November 16, 2013 - link
So they should just support another closed proprietary API, where they gain no advantage and just give competition another advantage, right...
Maxwell_88 - Thursday, November 14, 2013 - link
While this shouldn't change performance in any appreciable way this is going to enable a whole class of applications to be run on the GPU. For example currently no data structures with embedded pointers work on the GPU because the pointer is meaningless when we copy it over to the GPU.

With unified memory architecture CUDA takes care of this translation meaning u can now run applications that use complex data structures such as trees, linked lists etc.
Morawka - Thursday, November 14, 2013 - link
This guy is the only one who knows what he's talkin about.
p1esk - Thursday, November 14, 2013 - link
That was my thought as well.
Senti - Friday, November 15, 2013 - link
Yup, except performance would be horrible. Even now it's possible to map GPU memory to CPU and work with it like with regular memory with pointers etc., but no one sane would treat that mapped memory the same as system one if you care about the performance. It's not like you can't use unified pointers – you don't want to.
Hung_Low - Thursday, November 14, 2013 - link
More GPU scaling? Stacking 8? Mama mia!!! The high end gaming community will go nuts again
looncraz - Thursday, November 14, 2013 - link
Uh huh... so basically it hides the copies from the developer so it can change the underlying mechanics at a later date. This is NOT unified memory, it is simply memory management.

I've written things like this for almost two decades now to make my life a little easier... nothing innovative (though it may allow for some minimal performance CHANGES - gains here, losses there...).

AMD has it right, nVidia is doing damage control...
Krysto - Thursday, November 14, 2013 - link
This proves once again that Nvidia's chip engineers got lazy and stayed behind the software engineers, who were working on CUDA 6. Maxwell was supposed to arrive at the same time with CUDA 6, but I think they've delayed it until 2H2014.

How does CUDA 6 compare to OpenCL 2.0?
Yojimbo - Friday, November 15, 2013 - link
Aren't AMD and NVidia's new architectures delayed by TSMC's process cost issues? From what seems to be floating around, the new architectures are designed to be (or were designed to be) implemented on a <=20nm process technology, but NVidia is unhappy with the cost of 20nm production. Thermal density scaling is not great, costs are higher and projected to stay high, so I guess the main benefit is smaller die size? I think TSMC's 16nm finfet node is supposed to have about the same areal density as the 20nm, but with better thermal characteristics. I could have things wrong, though.
maximumGPU - Friday, November 15, 2013 - link
Doesn't C++AMP kinda do some of this already? you can define an array such that it can live on either the cpu or gpu memory, and copying will take place behind the scenes.
Anyway even if it's not truly unified memory, the lower entry barrier is a boon, and you can always opt for handling memory management yourself if performance is critical.
Senti - Friday, November 15, 2013 - link
New version of CUDA – we are so happy... not! World today needs implementation of open standards, like OpenCL 1.2 not the proprietary ones.

Also, performance with this looks like is going to take a massive drop. Anyone used their new unified OpenGL interop that doesn't require specifying OpenGL context? Sure, it's nice to use. And how about performance in case you plug in second GPU? Yes, it's horrible.
polaco - Sunday, November 17, 2013 - link
This is a piece of sh!t compared to AMD's HUMA implementation. HUMA is being standarized NVidia keeps proprietary. So lame this is a nice marketing notice, it would have been tons better if it was compared with HUMA a REAL UNIFIED MEMORY ARCHITECTURE. They should have been criticized by doing this kind of things and not praised.
hpvd - Tuesday, February 18, 2014 - link
first Maxwell is there!!
Does even this first small one have unified memory IN HARDWARE?
If yes: would be very interesting to look at the efficiency of this in comparison to the implementation in software for Kepler....

NVIDIA Announces CUDA 6: Unified Memory for CUDA

Post Your Comment

43 Comments

Back to Article

tipoo - Thursday, November 14, 2013 - link

tipoo - Thursday, November 14, 2013 - link

Nenad - Friday, November 15, 2013 - link

Flunk - Thursday, November 14, 2013 - link

andrewaggb - Thursday, November 14, 2013 - link

extide - Thursday, November 14, 2013 - link

Yojimbo - Friday, November 15, 2013 - link

tipoo - Thursday, November 14, 2013 - link

Kevin G - Thursday, November 14, 2013 - link

lours - Thursday, November 14, 2013 - link

AMDshit - Thursday, November 14, 2013 - link

Spunjji - Thursday, November 14, 2013 - link

wwwcd - Thursday, November 14, 2013 - link

AMDshit - Thursday, November 14, 2013 - link

ddriver - Thursday, November 14, 2013 - link

AMDshit - Thursday, November 14, 2013 - link

DanNeely - Thursday, November 14, 2013 - link

ddriver - Thursday, November 14, 2013 - link

AMDshit - Thursday, November 14, 2013 - link

ddriver - Thursday, November 14, 2013 - link

Kevin G - Thursday, November 14, 2013 - link

invinciblegod - Thursday, November 14, 2013 - link

ddriver - Thursday, November 14, 2013 - link

AMDshit - Thursday, November 14, 2013 - link

Spunjji - Thursday, November 14, 2013 - link

ddriver - Thursday, November 14, 2013 - link

Spunjji - Thursday, November 14, 2013 - link

ddriver - Thursday, November 14, 2013 - link

Morawka - Thursday, November 14, 2013 - link

ddriver - Friday, November 15, 2013 - link

Klimax - Saturday, November 16, 2013 - link

Maxwell_88 - Thursday, November 14, 2013 - link

Morawka - Thursday, November 14, 2013 - link

p1esk - Thursday, November 14, 2013 - link

Senti - Friday, November 15, 2013 - link

Hung_Low - Thursday, November 14, 2013 - link

looncraz - Thursday, November 14, 2013 - link

Krysto - Thursday, November 14, 2013 - link

Yojimbo - Friday, November 15, 2013 - link

maximumGPU - Friday, November 15, 2013 - link

Senti - Friday, November 15, 2013 - link

polaco - Sunday, November 17, 2013 - link

hpvd - Tuesday, February 18, 2014 - link

Log in

Don't have an account? Sign up now