Analysis, Breaking, Graphics, Hardware, News, Nvidia, VR World

Pascal Secrets: What Makes Nvidia GeForce GTX 1080 so Fast?

At the inaugural edition of North American Dreamhack conference, Electronic Arts and DICE launched Battlefield 1, while Nvidia unveiled their first Pascal-based consumer cards, the GeForce GTX 1080 and 1070. Both cards set to offer record-breaking performance per watt, and that performance enabled Nvidia to price the parts above its predecessors. In this article, we will analyze the key elements that make second Pascal chip (GP104) even more efficient than the GP100 (Tesla P100). While the performance results are under NDA until the May 17th (expect a tidal wave of reviews from usual suspects), we are now digging into the architecture that makes GeForce GTX 1080 ‘a screamer’.

Pascal GPU architecture marks the departure of Nvidia from ‘one fits all’ into an application specific silicon, fitting with the industry trend:

“We are going to define different application domains, and in those domains there will be very different selection criteria for architectures and devices,” said Tom Conte, co-chair of IEEE’s Rebooting Computing Initiative and a professor of computer science and electrical and computer engineering at Georgia Tech. “For machine learning, we’ve been running those as convolutional neural networks on GPUs, which is inefficient. And if you’re doing search optimization or simulation, you want a different architecture that does not look anything like high-performance, low-power CMOS.”

If we look into the past, GPU architectures such as Fermi, Kepler or Maxwell all had to compromise between demands of different product lines: GeForce for gamers, Quadro for visualization, Tegra for embedded (automotive/tablets), and Tesla for computation. What changed between the past four GPU generations is the size of Nvidia as a company. When Fermi launched in April 2010, Nvidia was a $3.54 billion per year company. Last year (Fiscal 2016), Nvidia broke into the five billion club ($5.01 billion) and this year is starting to looke like a $5.5-6.0 billion.

The Basics of GeForce GTX 1080

Nvidia GeForce GTX 1080 in 2-Way SLI

In our initial talks with Nvidia and their partners, we learned that the GeForce GTX 1080 is coming to market in several shapes:

  • GeForce GTX 1080 8GB
  • GeForce GTX 1080 Founders Edition
  • GeForce GTX 1080 Air Overclocked Edition
  • GeForce GTX 1080 Liquid Cooled Edition

Stock GTX 1080 is clocked at 1.66 GHz, with Turbo Boost lifting it to 1.73 GHz. Founders Edition includes overclocking-friendly BIOS to raise the clocks to at least 2 GHz, and the presentation showed the chip running at 2.1 GHz. The main limiting factor for the overclocking beyond 2.2 GHz is 225 Watts, which is how much the board can officially pull from the power circuitry: 75 Watts from the motherboard and 150 W through 8-pin PEG connector. However, there are power supply manufacturers which provide more juice per rail, and we’ve seen single 8-pin connector delivering 225 W on its own. Still, partners such as ASUS, Colorful, EVGA, Galax, GigaByte, MSI are preparing custom boards with 2-3 8-pin connectors. According to our sources, reaching 2.5 GHz using a liquid cooling setup such as Corsair H115i or EK Waterblocks should not be too much of a hassle.

Search for performance lead the company to remove as much legacy options as possible, and you can no longer connect the GTX 1080 with an analog display. D-SUB15 is now firmly in the past, and you cannot make the connection work even if you use a 3rd party adapter. The rest of connectors include a 144Hz-capable DVI, three DisplayPort 1.4 and a single HDMI 2.0B connector.

GP104 Chip: Nvidia Dumps IPC for Clockspeed

Build once, sell multiple times. This secret to fame and fortune was long time staple of numerous businesses worldwide, including the semiconductor industry. Intel builds just a limited number of core architectures, and then sells it in multiple packages (from Celeron / Pentium to Xeon E7). Even the Xeon Phi architecture originates from the P55C, the old Pentium MMX core from 1990s.

Comparison between Maxwell and Pascal SM Core

 

In the search for absolute performance per transistor, Nvidia revised the way how their Streaming Multiprocessor works. When we compare GM200 versus GP100 in clock-per-clock, Pascal (slightly) lags behind Maxwell. This change to a more granulated architecture was done in order to deliver higher clocks and more performance. Splitting the single Maxwell SM into two, doubling the amount of shared memory, warps and registers enabled the FP32 and FP64 cores to operate with yet unseen efficiency. For GP104, Nvidia disabled/removed the FP64 units – reducing the double-precision compute performance to a meaningless number, just like its predecessors.

  • GP100: 15.3 billion transistors, 3840 cores, 60 SM, 4096-bit memory, 1328 MHz GPU clock
  • GP104: 7.2 billion transistors, 2560 cores, 40 SM, 256-bit memory, 1660 MHz GPU clock

What is there is single-precision (FP32) performance, which stands at 9 TFLOPS. While the GP100 chip needs a Turbo Boost to 1.48 GHz in order to deliver 10.6 TFLOPS, GP104 clocks up to 1.73 GHz and that’s not the end. If you clock the GTX 1080 to 2.1 GHz, which is achievable on air – you will speed go past the GP100. We can already see the developers and scientists that need single-precision performance placing orders for air and liquid cooled GTX 1080s.

For DirectX 12 and VR, the term Asynchronous Compute was thrown around, especially since AMD Radeon-based cards were beating Nvidia GeForce cards in DirectX 12 titles such as Ashes of The Singularity and Rise of the Tomb Raider.  We were told that the Pascal architecture doesn’t have Asynchronous Compute, but that there are some aspects of this feature which qualified the card for ‘direct12_1’ feature set.

However, DX12 titles face another battle altogether, and that is delivering a great gaming experience. This is something where titles such as Gears of War Ultimate Edition or Quantum Break failed entirely, as Microsoft ‘screwed the pooch’ with disastrous conversions and limitations set forth by the Windows Store. Tim Sweeney event wrote an in-depth column on The Guardian stating what’s wrong with Microsoft. These days, game developers work hand in hand with both AMD and Nvidia in order to extract as much performance out of DirectX 12 as possible, which is needed for challenging VR environments.

GDDR5X: The Magic Ingredient

When GDDR5 memory came to market, development roadmap showed two generations of product: Single-Ended and Differential GDDR5 memory. Sadly, this memory standard received little development after the launch in 2008, as the memory manufacturers such a SK.Hynix and Samsung Semiconductor refused to bring it to the market. The team behind the GDDR5 standard at AMD focused its attention on developing HBM (High Bandwidth Memory), leaving GDDR5 to manufacturers, slowly building the capacity and that was that. Year and a half ago, after seeing that HBM1 is limited in capacity and that HBM2 memory won’t be available in real volume before 2017, Nvidia started to work with Micron’s team in Germany on building the ultimate performance GDDR5.

GDDR5X Memory Standard Presentation. Source: Micron

GDDR5X Memory Standard Presentation. Source: Micron

Manufactured in 20nm process, GDDR5X memory showed being overclocking-friendly even with the initial silicon. As the roadmap shows, the target was to hit the 10 Gpbs i.e. 2.5 GHz QDR. Given that the memory actually moves four times per cycle, it should be called Quad Data Rate, but the name GDDR SGRAM (Graphics Double Data Rate Synchronous Graphics Random Access Memory) was kept for continuity.

GeForce GTX 1080 has the memory clocked at 2.5 GHz but we do expect some of the samples clocking at 2.75-3.5 GHz (11-14 Gbps). That would raise the available bandwidth from 320GB/s to 352-448 GB/s and we do expect to see extreme overclockers pushing the memory even more. If Micron adopts 10nm process for GDDR5X, we’ll get to 4 GHz clock / 16 Gbps rather sooner than later.

Pascal vs. Maxwell: Charging More for Performance

At the end of the day, the cost of doing custom silicon is passed onto the consumer. A GeForce GTX 1080 will set you back for $50-150 more than the original GTX 980, while the GTX 1070 will set you back for $50-120 more than the GTX 970:

  • $329 GeForce GTX 970
  • $379 GeForce GTX 1070
  • $449 GeForce GTX 1070 Founders Edition
  • $549 GeForce GTX 980
  • $599 GeForce GTX 1080
  • $649 GeForce GTX 980 Ti
  • $699 GeForce GTX 1080 Founders Edition

The GTX 1080 Founders Edition will set you back for $50 more than the GTX 980 Ti which was essentially, a ‘Titan X Lite’. That same Titan X is a $999 product where the GM200 silicon has to split its duties between demands of completely different products:

  • $649 GeForce GTX 980 Ti
  • $999 GeForce GTX Titan X
  • $3999 Quadro M6000 12GB
  • $4299 Tesla M40 24GB
  • $4999 Quadro M6000 24GB

This difference in price mostly comes from the design choices the company made: GP104 is the GPU for gaming and it is focused solely on gaming. While it costs more than GTX 980 or GTX 980 Ti, it does offer a 10% performance boost when compared to a $1000 product, the GeForce GTX Titan X. For example, on the same system a GeForce GTX 1080 Founders Edition will score 9400 3Dmarks, while Titan X could not get past 8500. Again, May 17th will bring the number of reviews of GTX 1080 and GTX 1070, where some bold claims will be put to the test.

GeForce GTX 1080 is the most power efficient card Nvidia has ever produced.

Especially the performance-per-watt i.e. performance-per-transistor claims we saw at launch, which were incorrectly translated into 2x performance in games. Both the GeForce GTX 1080 and 1070 are the best what the company was able to launch so far, and the attention to detail is just staggering. By optimizing the GP104 silicon with GDDR5x memory, Nvidia is achieving performance levels we haven’t seen so far for the price point.

Still, it will be interesting to see how will R9 Nano, Fury and Fury X stand ground over the test of time, since we’re now seeing that three year old Hawaii architecture is handily beating the GeForce cards of the same vintage. Still, in the battle of these two heated competitors, it is customers that end up with better products.

We’re waiting for our GTX 1080 samples, and look forward to putting the boards through their paces.

  • Ben Moran

    Glad to know about the fact that they’re upgrading the parts they use for these cards now and still produce consumer friendly prices. Seeing the current charts and news about the 1080 just gets me more excited in upgrading from a 970. I hope to see more how it’ll perform in 4K and VR as well.

    https://versus.com/en/nvidia-geforce-gtx-1080

    • globula neagra

      So the fact that nvidia have started to charge the gaming industry the premium that they charge for the professional one is to be taken softly, I see what you did there John Snow, can`t wait for the 1080TI at 1k and Titan at 2k 🙂

      • Mast3r Race

        1080ti at $799 and Titan at $1199. That’s my guess. The Titan will have HBM2 though for sure, not positive if ti will. They will also be competing against AMD’s Vega which is going to be there premium 16nm cards. I guess Polaris is focusing more on mobile, console, and low end cards that still perform very well.

        • Behemothnl

          Where does it state that they use 16nm as all news is about 14nm.
          Also heard was that polaris wasactually more round up for the actual planar 20nm process and vega a new architecture that was more set for the 14nm FF.

          • Corey

            AMD slides show Polaris made for finFET. Vega might use the slightly larger process so they don’t effect production of mainstream GPU’s? not really sure just speculating. Pretty certain polaris 10 and 11 will use 14nm finFET to bring the power consumption down. AMD really wants to bring VR to cheaper low powered devices.

          • Both 14nm GlobalFoundries and 16nm TSMC process use FinFET transistor design exclusively. There’s no planar transistor design available on these nodes. Vega used to be called Greenland, a part of Arctic Islands family of GPUs. When AMD reorganized inside and launched RTG, their roadmap was renamed:

            Greenland – Vega 10 – R9 “Fury 2”
            Ellesmere – Polaris 10 – R9 480
            Baffin – Polaris 11 – R7 470

            GlobalFoundries and Samsung Electronics are in charge of AMD product, as both use the same process node (GlobalFoundries, IBM and Samsung used to be in a joint organization called Common Platform, which developed the manufacturing process).

  • wargamer1969

    Looking forward to seeing the difference from a 980ti at 4k.

    • Mast3r Race

      10%-15%. Nothing that major in terms of gains but when you consider that your comparing a full GM200 big chip vs a GP104 it’s real impressive. As you know the 1080 should be compared to the 980 to compare the real performance gains chip for chip.

      • Jared

        I wouldn’t listen to Master, since no one knows yet unless he’s an NVIDIA spy… rumors are that it varies a lot, because when some of the new features are implemented properly in games, the performance difference can be huge, according to a rumor. But current, non-DX12 games with no optimization will STILL probably be higher than 10-15% with reasonable OC.

        • Corey

          The big thing that I think will appeal to a lot of people who do a bit more than game is their compute power. This will be crucial as devs like the ones that develop the mercury engine for adobe start to support CUDA to openCL or port CUDA to AMD GPU’s. Given AMD know how to make high powered compute cards especially for to size of the node it might be possible for people to jump sides if they know they are going to be supported.

  • Screwyluie

    this is the first I’ve heard it officially said there’s not async compute, I’m wondering if there’s a source for that claim (not that I doubt it, just want to know where it came from)

    • dd101

      Read analysis at end of http://www.mobipicker.com/nvidia-gtx-1080-directx-12-benchmarks-async-compute/
      “This new hardware scheduler will play a crucial role in allowing Pascal GPUs to perform better at executing tasks asynchronously, even though it still evidently relies on pre-emption and context switching according to what Nvidia has revealed in its Pascal whitepaper.

      So while this scheduler doesn’t actually allow tasks to be executed asynchronously it will still improve the performance of Pascal GPUs when it comes to executing code that’s written asynchronously. It’s sort of a hack to hold Pascal off until proper async compute is implemented in Nvidia’s future architectures.”

  • mariano guntin

    looks like a bunch of marketing lies.

  • gamer

    gtx 1000 series from nvidia are high technic fastets gpu on market.
    sure market is full average budget gpu’s,but,who cares

    any who have cash can build gpu,with alot hardware and just run it.

    only bad think si that that kind guilded gpu, is totally waiste and take alot watt and run heat.
    .
    but just that.. and i mean gtx 1000 series great effyency and low heat and temp and also wisper silent run,make its best.
    shortly,gtx 1000 series gpu’s take even less food from electric walls and still are almost twice speed for games,¤K is also 50fps true.
    also,newest technical for light and games new options included gtx gpu’s.

    no1 get eveen near..

    for climp that place,takes huge engineer and testing work,as well millions and millions dollars. thats why they cost more than average budget gpu’s.
    they are worth that. sure.

    anyway, i support comapany who make great effyency,fast low heat and silent running gpu’s, of coz.,and cooling from air.

    watercooled gpu’d must denied for law. its not these days,we live 21 century.

    gpu’s what need watercool,or not run high wffywncy and make alor heat and noisy are lausy buildes,cheap them builders and out of day.

    who want and need thouse.
    even its cost 50$

    gpu like that is old and not value after 6 month.