At the GPU Technology Conference, Nvidia introduced its DGX-1 supercomputer. Based on combining the two 20-core Xeon E5 v4 processors with eight Tesla P100 cards, DGX-1 is a 3U server that promises to deliver 85.2 TFLOPS of compute performance (FP32). For a price of $129,000, you can order the DGX-1 system today and get the ultimate performance out of a single rack.
Yet during that same event, there might be a product that already upstaged the performance delivered by a single DGX-1 server. On the second day of the show, we encountered Supermicro’s 1U ‘Super GPU’ server. While Supermicro is known as a manufacturer of ultra-dense computers, and is probably the only system designer that ships “four GPU in 1U form factor”, their latest product is something to seriously consider in the upcoming high-performance computing (HPC) designs.
Supermicro’s 1U design features a dual PCB (Printed Circuit Board) design, with the ‘daughterboard’ occupying larger space than the ‘motherboard’. There are eight NVLink Mezzanine connectors which will be a home to four Tesla P100 boards. The only remaining elements on the board are 1.2 kilowatt (kW) power delivery system and two PCIe Gen3 bridge chips, which directly feed each Xeon E5 processor. In order to use all four Pascal-based Teslas, you have to have both Xeon processors installed. The design allows for two 145W Xeon E5 v4 processors, meaning you could use the same 20-core Xeon E5-2698 v4 that Nvidia utilizes in the DGX-1, or even go with a 22-core Xeon E5-2699 v4.
The system also allows for a four additional PCIe connectors – in the picture above, you can see that Supermicro opted for a visualization graphics card (Quadro K6000), but you can also install a Quadro M6000 12GB or 24GB without any issues. Remaining two PCIe connectors were used for two Infiniband cards, but the representative stated you can also take high-performing SSD storage system, such as Intel’s DC SSD family.
What makes this system especially interesting is that in a 3U configuration, it would offer significantly more compute power than Nvidia’s own DGX-1 system. Maximum configuration of Supermicro 3U system would be as follows (DGX-1 specs in brackets):
- 132 Broadwell-EP cores (vs. 40-core)
- 6 TB DDR4-2133 (vs. 2 TB)
- 12 Tesla P100 (vs. 8 Tesla P100)
- 3x Quadro M6000 24 GB
- 24 TB Intel DC P3608
- 78 TB Fixstars SSD
- 9x Infiniband Connections
Naturally, a price for this system would be much higher than $129,000 for a single DGX-1 system, but if you would go head to head in terms of performance, Supermicro’s setup in 3U gives some fascinating numbers:
- 254.4 TFLOPS FP16 (170)
- 136.2 TFLOPS FP32 (87.8)
- 68.1 TFLOPS FP64 (43.9)
Supermicro expects to bring the system to market during second half of the year, and it will be interesting to see price comparison between competing systems. Nvidia created an excellent baseline for Tesla P100 processors, and it is up to hardware partners such as Supermicro to beat the reference system. Judging by what we saw, the company from San Jose, CA – potentially built a supercomputing monster. Advertising the system and finding its position on the market will be the biggest challenge for Supermicro, which even wasn’t mentioned during the keynote as a Tesla P100 supplier. Given the performance this system promises, we’re not surprised to see why.