Enterprise, Graphics, Hardware, Software Programs

Nvidia Launches Tesla K40 and CUDA 6 with Unified Memory at SC13

As the world of HPC prepares to converge upon Denver, Colorado starting tomorrow for Supercomputing 2013, Nvidia has made a slew of announcements regarding their latest and greatest compute hardware and software. More specifically, Nvidia has announced their Telsa K40 compute card and CUDA 6. Although, the CUDA 6 announcement actually came on Thursday. Since SC13 is when the Top 500 supercomputer list gets updated for the second time during the year, it will be interesting to see what happens with the landscape of supercomputers with these new K40 Tesla GPUs.

The Tesla K40 is the Supercomputing equivalent to the GTX Titan 780 Ti in the sense that they share the same GPU architecture and that they are both similar in specs relative to their predecessors. With the 780 Ti, things get a bit more complicated, but the real thing that you need to know is that both cards finally enable all 2880 cores on the GPU. This is something everyone has been waiting a long time to finally see occur, as it is technically the true Kepler architecture in its full implementation. The GK-110 GPU in the K40 is also accompanied by a doubled amount of GDDR5, which means that a lot of people that ran out of memory on the K20X will no longer have that concern.

As you can tell from Nvidia’s numbers above, you can expect to see a pretty significant improvement out of the K40 over the K20X, even though the two cards are not really that different. The real performance improvements come from the additional SMX cluster as well as the increased base clock as well as multiple boost clocks. And thanks to the wonders of new steppings, the newer faster card is still going to consume as much power as the previous card did. There is also a slight bump in memory clock speed, but the results of that would be negligible at best.

Currently, Nvidia has a handful of OEM partners helping them deploy the Tesla K40 into supercomputers and other HPC compute applications, and also has a small handful of customers already. Judging by the below OEM list, they’ve pretty much got all of the important companies all under one roof.

And in terms of design wins, it looks like Nvidia may have pushed Intel out of TACC. While there are no official announcements about Xeon Phi, it doesn’t look like they’ve managed to get any significant pull as of yet. And considering the possible repositioning of Xeon Phi, there’s a good chance that won’t happen for a while.

Looking at these announcements, we can see that Nvidia continues to work closely with the scientific community to deploy their new Telsa GPUs to enable high performance compute. However, Nvidia is not only proud of how fast the supercomputers are that use their GPUs, they are also very proud of how green they are. With the Tokyo Tech Tsubame-KFC System (not a franchise of Kentucky Fried Chicken). On the Tokyo Tech website, you can see the Tsubame 2.0 Supercomputer recently got an update in September and subsequent updates including the Tsubame-KFC experimental system will enable for low-power high-performance exascale computing.

Last but not least is the big announcement that came late last week out of CUDA. The newly announced CUDA 6 platform is obviously going to be used with these new Tesla K40 compute GPUs and as a result, Nvidia has taken AMD’s lead and implemented unified memory in CUDA 6. By being able to share memory they vastly improve the efficiency of the system as a whole and will make developing on CUDA easier. While the details on how this would work are still sparse, it is clear that unified memory is the way going forward and it will be interesting to see how things pan out for the rest of Nvidia’s products.

We’re very happy to see the implementation of unified memory and the 12GB Tesla K40. Hopefully these two developments will continue to push forward the advances in science and medicine. made thanks to GPGPU compute. We can only imagine what will be possible next with these improvements to power consumption and compute capability. Now we eagerly await to see what AMD’s next generation of FirePro cards will be able to deliver and whether or not they will go 8GB or 16GB. Things have finally gotten interesting, but AMD definitely has a lot of room to grow when it comes to HPC and computer when compared to where Nvidia is today.