At the inaugural edition of North American Dreamhack conference, Electronic Arts and DICE launched Battlefield 1, while Nvidia unveiled their first Pascal-based consumer cards, the GeForce GTX 1080 and 1070. Both cards set to offer record-breaking performance per watt, and that performance enabled Nvidia to price the parts above its predecessors. In this article, we will analyze the key elements that make second Pascal chip (GP104) even more efficient than the GP100 (Tesla P100). While the performance results are under NDA until the May 17th (expect a tidal wave of reviews from usual suspects), we are now digging into the architecture that makes GeForce GTX 1080 ‘a screamer’.
Pascal GPU architecture marks the departure of Nvidia from ‘one fits all’ into an application specific silicon, fitting with the industry trend:
“We are going to define different application domains, and in those domains there will be very different selection criteria for architectures and devices,” said Tom Conte, co-chair of IEEE’s Rebooting Computing Initiative and a professor of computer science and electrical and computer engineering at Georgia Tech. “For machine learning, we’ve been running those as convolutional neural networks on GPUs, which is inefficient. And if you’re doing search optimization or simulation, you want a different architecture that does not look anything like high-performance, low-power CMOS.”
If we look into the past, GPU architectures such as Fermi, Kepler or Maxwell all had to compromise between demands of different product lines: GeForce for gamers, Quadro for visualization, Tegra for embedded (automotive/tablets), and Tesla for computation. What changed between the past four GPU generations is the size of Nvidia as a company. When Fermi launched in April 2010, Nvidia was a $3.54 billion per year company. Last year (Fiscal 2016), Nvidia broke into the five billion club ($5.01 billion) and this year is starting to looke like a $5.5-6.0 billion.
The Basics of GeForce GTX 1080
In our initial talks with Nvidia and their partners, we learned that the GeForce GTX 1080 is coming to market in several shapes:
- GeForce GTX 1080 8GB
- GeForce GTX 1080 Founders Edition
- GeForce GTX 1080 Air Overclocked Edition
- GeForce GTX 1080 Liquid Cooled Edition
Stock GTX 1080 is clocked at 1.66 GHz, with Turbo Boost lifting it to 1.73 GHz. Founders Edition includes overclocking-friendly BIOS to raise the clocks to at least 2 GHz, and the presentation showed the chip running at 2.1 GHz. The main limiting factor for the overclocking beyond 2.2 GHz is 225 Watts, which is how much the board can officially pull from the power circuitry: 75 Watts from the motherboard and 150 W through 8-pin PEG connector. However, there are power supply manufacturers which provide more juice per rail, and we’ve seen single 8-pin connector delivering 225 W on its own. Still, partners such as ASUS, Colorful, EVGA, Galax, GigaByte, MSI are preparing custom boards with 2-3 8-pin connectors. According to our sources, reaching 2.5 GHz using a liquid cooling setup such as Corsair H115i or EK Waterblocks should not be too much of a hassle.
Search for performance lead the company to remove as much legacy options as possible, and you can no longer connect the GTX 1080 with an analog display. D-SUB15 is now firmly in the past, and you cannot make the connection work even if you use a 3rd party adapter. The rest of connectors include a 144Hz-capable DVI, three DisplayPort 1.4 and a single HDMI 2.0B connector.
GP104 Chip: Nvidia Dumps IPC for Clockspeed
Build once, sell multiple times. This secret to fame and fortune was long time staple of numerous businesses worldwide, including the semiconductor industry. Intel builds just a limited number of core architectures, and then sells it in multiple packages (from Celeron / Pentium to Xeon E7). Even the Xeon Phi architecture originates from the P55C, the old Pentium MMX core from 1990s.
In the search for absolute performance per transistor, Nvidia revised the way how their Streaming Multiprocessor works. When we compare GM200 versus GP100 in clock-per-clock, Pascal (slightly) lags behind Maxwell. This change to a more granulated architecture was done in order to deliver higher clocks and more performance. Splitting the single Maxwell SM into two, doubling the amount of shared memory, warps and registers enabled the FP32 and FP64 cores to operate with yet unseen efficiency. For GP104, Nvidia disabled/removed the FP64 units – reducing the double-precision compute performance to a meaningless number, just like its predecessors.
- GP100: 15.3 billion transistors, 3840 cores, 60 SM, 4096-bit memory, 1328 MHz GPU clock
- GP104: 7.2 billion transistors, 2560 cores, 40 SM, 256-bit memory, 1660 MHz GPU clock
What is there is single-precision (FP32) performance, which stands at 9 TFLOPS. While the GP100 chip needs a Turbo Boost to 1.48 GHz in order to deliver 10.6 TFLOPS, GP104 clocks up to 1.73 GHz and that’s not the end. If you clock the GTX 1080 to 2.1 GHz, which is achievable on air – you will speed go past the GP100. We can already see the developers and scientists that need single-precision performance placing orders for air and liquid cooled GTX 1080s.
For DirectX 12 and VR, the term Asynchronous Compute was thrown around, especially since AMD Radeon-based cards were beating Nvidia GeForce cards in DirectX 12 titles such as Ashes of The Singularity and Rise of the Tomb Raider. We were told that the Pascal architecture doesn’t have Asynchronous Compute, but that there are some aspects of this feature which qualified the card for ‘direct12_1’ feature set.
However, DX12 titles face another battle altogether, and that is delivering a great gaming experience. This is something where titles such as Gears of War Ultimate Edition or Quantum Break failed entirely, as Microsoft ‘screwed the pooch’ with disastrous conversions and limitations set forth by the Windows Store. Tim Sweeney event wrote an in-depth column on The Guardian stating what’s wrong with Microsoft. These days, game developers work hand in hand with both AMD and Nvidia in order to extract as much performance out of DirectX 12 as possible, which is needed for challenging VR environments.
GDDR5X: The Magic Ingredient
When GDDR5 memory came to market, development roadmap showed two generations of product: Single-Ended and Differential GDDR5 memory. Sadly, this memory standard received little development after the launch in 2008, as the memory manufacturers such a SK.Hynix and Samsung Semiconductor refused to bring it to the market. The team behind the GDDR5 standard at AMD focused its attention on developing HBM (High Bandwidth Memory), leaving GDDR5 to manufacturers, slowly building the capacity and that was that. Year and a half ago, after seeing that HBM1 is limited in capacity and that HBM2 memory won’t be available in real volume before 2017, Nvidia started to work with Micron’s team in Germany on building the ultimate performance GDDR5.
Manufactured in 20nm process, GDDR5X memory showed being overclocking-friendly even with the initial silicon. As the roadmap shows, the target was to hit the 10 Gpbs i.e. 2.5 GHz QDR. Given that the memory actually moves four times per cycle, it should be called Quad Data Rate, but the name GDDR SGRAM (Graphics Double Data Rate Synchronous Graphics Random Access Memory) was kept for continuity.
GeForce GTX 1080 has the memory clocked at 2.5 GHz but we do expect some of the samples clocking at 2.75-3.5 GHz (11-14 Gbps). That would raise the available bandwidth from 320GB/s to 352-448 GB/s and we do expect to see extreme overclockers pushing the memory even more. If Micron adopts 10nm process for GDDR5X, we’ll get to 4 GHz clock / 16 Gbps rather sooner than later.
Pascal vs. Maxwell: Charging More for Performance
At the end of the day, the cost of doing custom silicon is passed onto the consumer. A GeForce GTX 1080 will set you back for $50-150 more than the original GTX 980, while the GTX 1070 will set you back for $50-120 more than the GTX 970:
- $329 GeForce GTX 970
- $379 GeForce GTX 1070
- $449 GeForce GTX 1070 Founders Edition
- $549 GeForce GTX 980
- $599 GeForce GTX 1080
- $649 GeForce GTX 980 Ti
- $699 GeForce GTX 1080 Founders Edition
The GTX 1080 Founders Edition will set you back for $50 more than the GTX 980 Ti which was essentially, a ‘Titan X Lite’. That same Titan X is a $999 product where the GM200 silicon has to split its duties between demands of completely different products:
- $649 GeForce GTX 980 Ti
- $999 GeForce GTX Titan X
- $3999 Quadro M6000 12GB
- $4299 Tesla M40 24GB
- $4999 Quadro M6000 24GB
This difference in price mostly comes from the design choices the company made: GP104 is the GPU for gaming and it is focused solely on gaming. While it costs more than GTX 980 or GTX 980 Ti, it does offer a 10% performance boost when compared to a $1000 product, the GeForce GTX Titan X. For example, on the same system a GeForce GTX 1080 Founders Edition will score 9400 3Dmarks, while Titan X could not get past 8500. Again, May 17th will bring the number of reviews of GTX 1080 and GTX 1070, where some bold claims will be put to the test.
Especially the performance-per-watt i.e. performance-per-transistor claims we saw at launch, which were incorrectly translated into 2x performance in games. Both the GeForce GTX 1080 and 1070 are the best what the company was able to launch so far, and the attention to detail is just staggering. By optimizing the GP104 silicon with GDDR5x memory, Nvidia is achieving performance levels we haven’t seen so far for the price point.
Still, it will be interesting to see how will R9 Nano, Fury and Fury X stand ground over the test of time, since we’re now seeing that three year old Hawaii architecture is handily beating the GeForce cards of the same vintage. Still, in the battle of these two heated competitors, it is customers that end up with better products.
We’re waiting for our GTX 1080 samples, and look forward to putting the boards through their paces.