At the 2016 GPU Technology Conference, Nvidia hosted a packed session with at least 1,000 people in attendance. The topic was a deep dive into the DRIVE PX2, autonomous drive development kit which will start shipping later this year in its full performance capability – as the current units are only being shipped with Maxwell-class GPUs to Tier 1 customers.
Hosted by Shri Sundaram, the session actually gave a very good insight not just into how DRIVE PX 2 works, but also what is Nvidia bringing to the market in the second half of the year, in the form of two new 16nm FinFET processors, Pascal-based Tegra and a low-end Pascal GPU for embedded markets.
DRIVE PX 2 Overview: Tegra Meets Pascal
DRIVE PX 2 is a rapid prototyping system for automotive industry which uses two next-gen Tegra processors, and two discrete Pascal graphics processors, used for GPGPU computational purposes. Each Tegra connects directly to Pascal GPU using a PCIe Gen 2 x4 (total bandwidth: 4.0 GB/s).
As you can see in the picture above, the next-gen Tegra is a sexa-core, or 6-core design with two second generation “Denver2” cores, and a cluster of four Cortex-A57 cores. The integrated Pascal GPU is connected to the GPU using significantly higher bandwidth connection (intra-chip fabric) than the external, but Shri said that during the internal testing, the company was unable to saturate the PCIe Gen3 bus, and that the decision to go with PCIe x4 was the right one.
Each Tegra and Pascal form a stackable cluster, which talks to one another using a bi-directional 1GbE Ethernet connection. This design focuses on ‘Safety Critical’ (SC) aspect as it is intended for use in real-world scenarios, not just in a lab.
Tegra processor uses UMA (Unified Memory Architecture) and attaches to 8GB LPDDR4 memory in dual-channel configuration, achieving approximately 51.2 GB/s. This was achieved by using the fastest configuration possible: 128-bit interface connects to two memory chips which can achieve 25.6 GB/s per single chip. Nvidia uses Samsung, but if a vendor would go and use Micron’s 2133 MHz chips, theoretical bandwidth of 64 GB/s is a possibility. This is more than the highest bandwidth figures a desktop Skylake-based processors achieve on desktop and mobile (i7-6700K with DDR4-2133: 34.13 GB/s, DDR4-2666: 42.66 GB/s).
Discrete Pascal GPU is the entry-level, low-voltage part which uses 128-bit interface that connects to eight GDDR5 memory chips clocked at 1.25 GHz QDR for a total bandwidth of 80 GB/s. The clock can scale to 1.5 GHz QDR for a total of 96 GB/s.
DRIVE PX 2 Interface: 70 Gbps of IO
Perhaps the most impressive part of DRIVE PX 2 is the amount of bandwidth the design offers. If you’re an embedded engineer, seeing figures such as the 8.75 GB/s total combined I/O bandwidth in a car can only make your mouth water. The car industry relies on decades old serial standard called CAN (Controller Area Network), which tops at 10 Mbps (1.25 MB/s), while the fastest protocol inside the car was MOST150, i.e. an optical Ethernet connection which as you might have guessed, peaks at 150 Mbps i.e. 18.75 MB/s.
With 8.75 GB/s of available bandwidth, the DRIVE PX 2 system comfortably supports up to 12 high-resolution cameras, LiDAR and similar solutions, whose input can be as high as 1.5 Gbps (187.5 GB/s). Twelve cameras in their full configuration consume up to 20 Gbps, which is why Nvidia delivers DRIVE PX 2 to the market using highest bandwidth capabilities out of any integrated solution.
Power is nothing Without Control: Software Stack
To paraphrase a commercial for a tire manufacturer, all the hardware capabilities are useless if you can’t control the capabilities in a Safety Critical, AutoSAR environment. In order to perform in real world capabilities, DRIVE PX 2 runs on a combination of operating systems supported by Nvidia’s in-house designed Hypervisors. The DRIVE PX 2 system ships in a fully configured setup with Nvidia’s own Vibrante Linux distribution, QNX RTOS and 3rd party monitors such as Infineon’s Aurix MCU.
For example, EasyCAN integration was done through a collaboration with Elektrobit, which integrated full CAN bus onto the DRIVE PX 2 system. There are more 3rd party partners, such as Velodyne (LiDAR), Point Gray cameras and many more.
We end our initial coverage with the details you might want to know. DRIVE PX 2 already shipped to the initial Tier 1 customers, at a price of approximately $15,000 per unit. This is perhaps the biggest strength of DRIVE PX 2, as it costs significantly less than a stand-alone, multi-GPU computer rack which has to be separately wired into the test vehicle – which also needs several kW to run. Do note that Pascal-based DRIVE PX 2, the one we describe in this article should ship during the third quarter of 2016.
Nvidia recommends you utilize its own in-house developed DIGITS supercomputer to develop software for the DRIVE PX 2, which represents another $15,000 investment. Still, given that we were privy to the development costs of an automotive industry, this system has the potential to become ubiquitous among the car industry, and this is where Nvidia will win or fail. One of most common mentions from the automotive vendors we talked to discussed issues they experience with Nvidia, and other hardware vendors. For example, Nvidia Automotive focuses solely on its Tier 1 vendors, and refuse to work with smaller manufacturers such as supercar manufacturers and others – even though the cost of designing a custom solution represents a ‘non-issue’ with these companies.
That has always been the weakest point in Nvidia’s armor – superb hardware and software capabilities sometimes get bogged down by lack of human resources. Once that the first self-driving cars from Tier 1’s hit the market, we might see a change in the attitude as Automotive division gains more engineers and support for the people that have no issues with the price of the system.
Besides great hardware, Nvidia will also need to bring disruption into the ‘no can do’ mindset, which is prevalent in the automotive industry.