3D, Analysis, Audio/Video, Breaking, Business, Enterprise, Event, Graphics, Hardware, News, Nvidia, VR World

New NVIDIA Quadro Family Plans to Heavily Monetize Pascal GPUs

NVIDIA’s scenario about the GeForce / Quadro / Tesla line-up experienced a lot of turnover over the past couple of years. The sequence of “launch as GeForce, downclock as Tesla, optimize and launch as Quadro,” changed into “launch as Tesla, optimize as GeForce and be reliable as Quadro”. With Pascal, story turned to be almost the same. NVIDIA introduced GP100 as Tesla in April 2016, followed with GP102 chip as Titan X (no longer branded as GeForce), Quadro P6000 and Tesla P40. At the same time, the GP104/106/107 did not experience the same sequence, with only GP104 debuting as Quadro P5000 and Tesla P40.

Second day of SolidWorks saw the debut of complete Pascal-based Quadro family, bringing the original GP100 chip branded as “GP100”. GP104, GP106 and GP107 are brought into the new form factors as well. From top to bottom, Quadro line-up is now complete… and the top is quite spectacular. Pascal is without any doubt, NVIDIA’s most powerful GPU architecture ever. Quick, precise and rarely breaks down (based on RMA numbers from several SI’s I talk with). Most of technical specifications of the new Quadro series is known and are quite great.

All together the Quadro family is refreshed with eight Pascal-based cards. Not mentioning improvement in architecture compared to previous, Maxwell and Kepler family, the P series brings DisplayPort 1.4 and support for 4K@120Hz and 5K@60Hz monitors together with a leap toward VR developers; Simultaneous Multi-Projection (SMP) and lots of improvements in the Quadro ecosystem with the new SDK, Iray updates and its integration with Nvidia’s DGX-1 server. Pascal-based Quadro also brings an update to the OptiX Ray-Tracing engine and the way how is Mental Ray works on Autodesk’s Maya.

NVIDIA Pascal-based Quadro Family

Taking a more detailed look at the cards, we start with the entry level, which brings GP107 silicon into the mix. There are three cards with GP107, which is produced in Samsung’s foundry using the 14nm FinFET process. This is the first non-TSMC manufactured chip since NVIDIA ditched TSMC/UMC mix in late 2000s. The first GP107 part is the Quadro P400, which replaces Kepler-based K420. While P400’s TDP is only rated at 30W, the performance of 256 CUDA cores should give more than double of what K420 could bring. Also note that the board now features GDDR5, rather than the previously used Low-Power DDR3 memory. With this change, NVIDIA (finally) no longer has a part in current line-up supporting anything else than GDDR5, GDDR5X or HBM2 memory.

Next card is the Quadro P600, which replaces the K620. Just like P400, it also features 2GB of GDDR5 memory, rated at 40W TDP. P600 has 384 cores, resulting in 1.195 TFLOPS of compute performance. Final chip carrying the GP107 chip is P1000, which brings 640 cores, 1.89 TFLOPS of compute performance and 4GB GDDR5 memory. The card consumes 47 Watts under full load, and is the final half-height, single-slot card in the Quadro Pascal family. With this introduction, the GP107 GPU replaces a mix of Kepler and Maxwell GPUs (some Quadro K-cards carried the Maxwell architecture). Now NVIDIA can retire their orders of Kepler and Maxwell silicon including what they had to support with “long life” Tesla cards (1st gen Tesla VCA / GRID), which expire by the end of 2017.

Next step in the Quadro lineup are cards intended for the mid-range market; P2000 and P4000. Quadro P2000 is a bit of an odd product, with the same TDP as last-gen M2000 (75W), but offering more than 50% percent performance upgrade. This single slot card achieves 3.0 TFLOPS of single-precision FP32 compute performance. Interestingly, it features 5GB GDDR5 through a 160-bit memory controller. This is probably the first time we’re seeing this combination, and it really begs the question why NVIDIA did not keep the 192-bit controller on the GP106 silicon. Then again, this is probably silicon that did not qualify to be a more “grown up” part.

P4000 is curtailed GP104 GPU which when in its full force makes for a P5000. P4000 is also the lowest-numbered card that Nvidia classifies as “VR Ready”, and also is the most powerful single slot card in the whole Pascal family (GTX 1060, 1070 and 1080 are all dual slot cards). Still, the company kept TDP low at just 105W, 15W lower TDP than its predecessor (M4000). Quadro P4000 officially delivers 5.3 TFLOPS in FP32 (single precision) – double of what its direct predecessor was able to reach.

Beefiest version of GP104 to come to market (given that there’s no GeForce GTX 1080 16GB, no Tesla cards are powered by the GP104 chip) is Quadro P5000. This is the first Quadro card with 16GB of GDDR5X memory, supporting for ECC and optional soft-ECC method. Rated at 180W TDP, the P5000 delivers 8.9 TFLOPS of FP32 compute performance. In a way, the Quadro P5000 represents the biggest generation jump between the two “5000” boards; 8GB GDDR5 is upgraded to 16GB of GDDR5X, 4.3 TFLOPS of compute is replaced with 8.9 TFLOPS, and all of that with just 25% more CUDA cores and 17% higher power consumption. To me personally, P5000 is maybe the most impressive part when it comes to price/performance ratio ($2499).

Quadro P6000 was introduced last year, and it represents the Quadro equivalent of TITAN X and Tesla P40: GP102 chip, 24GB of GDDR5X memory and 12 TFLOPS of FP32 compute, replacing the M6000 which packed 12/24GB of GDDR5 and 7.0 TFLOPS of FP32, for the same price of $5000.

Finally, there’s one exotic bird, the “Big Pascal ” i.e. one and only GP100. This card probably targets real-time visualization and rendering in Hollywood/Bollywood studios, Big Oil simulations – everything for what you used Tesla P100 in order to visualize computational results. Quadro GP100 is not the fastest single-precision compute performing Quadro – it offers 10.2 TFLOPS FP32, while P6000 offers almost 20% more. However, the key performance difference is that Quadro GP100 brings NVLink to the world of PCIe cards. Two new connectors on top of the PCB open a bi-directional 80GB/s bandwidth, and enable GPUs to work as one  7168 core-enabled GPU with 32GB of ultra-fast HBM2 memory, which delivers a 720GB/s bandwidth, i.e. a combined bandwidth of 1.44 TB/s, shared over the already mentioned 80GB/s bi-directional NVLink.

It will be interesting to see what Hollywood movie (read: legendary Deadpool intro-scene and the Quadro M6000), or a car launch (read: Jaguar I-PACE) gets “the next gen” treatment with the two Quadro GP100s working in unison.