The fastest ATI 5870 card achieves 3TFLOPS!

Few days ago, SmoothCreations introduced their own ATI Radeon 5870 card. Available only inside systems by SmoothCreations, their Radeon HD 5870 1GB XOC Havoc is nothing short of brilliant. The Cypress GPU is clocked at 950 MHz, while the 1GB of GDDR5 memory ticks at 1.4 GHz QDR [5600 GT/s or 5.6 "GHz"].

A 950 MHz clock for the GPU is nothing to be sneezed at – you get 3.04 TFLOPS of computing power, a nice 10% boost over the original. But this 10% boost in clock you get to go beyond 3TFLOPS, a first single ASIC ever to do so. With 5850X2 and 5870X2, we get shivers if you take into account that a single board will deliver well over 5TFLOPS? and with the range of 100 bucks per TFLOPS, we know a scientist or two that have wet dreams about these babies. Bear in mind that prior to the advancement of GPGPU movement, a for single TFLOPS of computing power you would need 200-250 CPUs in their server-adoptable commercial versions [AMD Opteron, Intel Xeon], meaning 1TFLOPS = $112,500.
Looking at internals of the Cypress GPU at 950 MHz, you’re looking at L1 cache bandwidth of 1.18TB/s [Yep, TeraByte per second], almost 500GB/s of L1 to L2 cache transfer [486 GB/s], or the overall video memory bandwidth of breathtaking 179.2GB/s.

Since this kind of overclock wasn’t achieved by a single reviewer of reference boards, we decided to ask SmoothCreations how it was possible not just to achieve this overclock, but actually to have 950 MHz for the GPU as a shippable clocks with full product warranty.

Mario Gastellum, director of product development and engineering at SmoothCreations told us what steps the company did to ensure such clocks: "We over clocked the crap out of the card. Luckily, we bumped the voltage and the GPU fan is pushed to 20% more. AMD has always been good to us when it came to BIOS tweaking." But the secret sauce comes with the name BIOS tweaking: "we adjusted the memory timings" speaks volumes of just how high you can push the GDDR5 memory. We knew that Joe Macri and his team at AMD created a memory standard that supports "overclocking frenzy", but to see a memory working at a 40% higher clock [than rated] – that is something we did not expect. Evergreen family of GPU chips comes with a fully-optimized memory controller for the GDDR5, and built-in ECC function into every GDDR5 chip means you can "drive the cells" to the yet unseen points in the world of DDR SDRAM.

When you take a look at this single GPU harnessing 179.2GB/s, with Hemlock [5850X2, 5870X2] boards we could see single-PCB bandwidth of 281.6GB/s on 5850X2 to 306GB/s on 5870X2. If dual boards would be able to sustain this level of memory tweaking, 358.4GB/s sounds realistic in this generation of products from ATI. When it comes to nVidia, the fact that the company is sticking with 512-bit memory controller should result in bandwidth of 256GB/s for a baseline 1.0 GHz memory configuration. Sources inside nVidia also told us that the GDDR5 memory is insanely clockable; "GDDR5 certainly likes to play ball", we will end 2009 with both parties having 300GB/s+ products. And that is just insane.

We thank Mario for giving us an insight into the intricacies of BIOS on the 5800 series of products.