Enterprise, Gaming, Graphics

GPU Computing shows superior efficiency in Australian Outback


During our preparation for recently held nVidia GPU Technology Conference, we encountered a session named "Diesel-powered GPU Computing: Enabling a Real-Time Radio Telescope in the Australian Outback." The title intrigued us and after seeing a keynote by Hanspeter Fister from Harvard University – we are quite encouraged with what we saw.

All the talk about supercomputers and thousands of central or graphics processors is nice, but what if you need TFLOPS of compute power for analyzing data 500 miles from civilization. Yes, with no power in sight, the only power is a good old trusty diesel engine that is capable of outputting 20 KiloWatts of power. Now, the question "why would you need TFLOPS of computing power in the middle of nowhere" comes with quite a simple answer: "We, humans – ARE the reason."

Listening to the Universe courtesy of MWA

Unfortunately, our inter-connected world brought scientists to their knees as far as necessity for listening to the Universe is considered. In order to "listen to space", scientists today need powerful radio telescopes that can penetrate through Earth’s atmosphere and filter out all the "junk" we manufacture throughout tens of thousands of AM, FM radio, cellphone, TV, Wi-Fi and other signals. To make irony bigger, it turns out that the most efficient spectrum we need to use to listen to space is exactly the overcrowded AM and FM spectrum.

This is also the reason why the location of Radio telescopes is usually "in the middle of nowhere". Currently, scientists rely on a very small number of installations around the world, such as Arecibo Observatory in Puerto Rico [used for SETI research, detecting near-earth objects] or VLA, world’s largest Radio telescope array located in New Mexico.

The cost of building installation such as VLA’s 27-antenna installation is measured in hundreds of millions of dollars, while at the same time – such installation is considered inefficient as it covers measly 3% of the sky.
In collaboration with Harvard-Smithsonian Center for Astrophysics, MIT [MIT Haystack Observatory and MIT Kavli Institute], Australia’s CSIRO and several other institutes, Murchinson Widefield Array is being brought to life.

MWA [Murchinson Widefield Array] is promising to be all what conventional installations such as afore mentioned VLA or the upcoming Chinese FAST cannot even dream of achieving. By installing hundreds or even thousands of small antennas in a grid organization, MWA is a highly-efficient design [without moving parts] that can cover up to 20 degrees of sky above us. This almost seven-fold increase in what Radio Telescopes can achieve opens a whole new world for scientists and of course, could provide a whole lot more answers than we have them. Creation of Universe, Search for Extra-Terrestrial Intelligence or Earth Collision Detection being among them.

The win-win part of MWA design is the fact that it is much cheaper to build as well…

MWA Prototype
In order to be efficient, a prototype of MWA was placed "in the middle of nowhere" [yes, these scientists lead very social lives], deeply in Australian Outback. Thanks to Google Street View, you can even see the prototype installation yourself [head to http://maps.google.com and type "328 Mullewa Carnarvon Rd, Murchinson, WA, Australia", then scroll northbound until you see shiny metal to the west].

One of 32 prototype clusters featuring 16 antennas
One of 32 prototype clusters featuring 16 antennas – the shadow on the left represents average human.

The prototype installation is consisted out of 32 antenna clusters with each cluster formed out of 16 small antennas. The goal is to span 512 clusters over one square kilometer [8000 antennas]. The antennas will observe frequencies between 800-300 MHz, trying "to discover low-frequency radio phenomena that have never been seen before."

Currently, the prototype installation is being calibrated, with the installation of full 512-cluster array starting during 2010. Planned time of completion is 2020 and here is where high-efficiency computing is entering the frame.

Outback Computing or How GPUs rule the power efficiency rating
Diesel-powered computing? Yep, this is the diesel engine. And no, it can't give you more power than physically possible
Diesel-powered computing? No, it can’t give you more power than physically possible

As you might have guessed, the location isn’t exactly reachable with a conventional power grid, thus a diesel-powered generator is being used. The generator creates limited supply of electricity, resulting in a very stingy budget for computing. The unfortunate part is the fact that array in its current shape needs 20 TFLOPS of computing power, with an overall power budget of just 20kW.

This is how the data processing pipeline works...Given that you would need around 200 CPUs for the job, as 200 Xeon 5500 CPUs at 3.2 GHz [100 GFLOPS each] would consume a grand total of 24,000Watts [that’s not counting the rest of the computer needed for these CPUs to function]. Overall estimate in CPU-based setup was in excess of 55,000 Watts [55kW] and obviously, it was a dead end.

In October 2006 nVidia launched CUDA, following up with GeForce 8 hardware in November 2006. According to scientists from this project, this was the beginning of a breakthrough, with GeForce GTX 280 andTesla C1060 cards winning the computing challenge. In around 1kW ofpower, scientists managed to squeeze 4.5TFLOPS [dual GTX 295 card], meaning "only" 5.5kW is needed for 20TFLOPS.

With upcoming Fermi-based cards, Australians expect to build a 20TFLOPS setup using only3.3kW. Yes, ATI Radeon HD 5870 cards are more efficient, but unfortunately for AMD – these scientists cannot wait until OpenCL is complete or being forced to buy AMD CPUs in order to get Radeon or FirePro GPUs to have 100% functional ATI Stream work environment etc etc. This is as real as it gets, folks. Millions of dollars ar
e at stake, and these scientists selected nVidia hardware, regardless of what we mortals think.

A milestone project for cGPU/GPGPU/GPU Computing?
Putting numbers into perspective, a fully grown array will generate 1Tbps of data and require 1 EFLOPS [Exa] to process all the generated data. Given that the infrastructure is being built around nVidia’s hardware, there is a pretty good chance that nVidia cGPUs will power what is currently considered to be world’s first EFLOPS computer.

MWA after completion in 2020: Compare the footprint of antennas compared to
MWA after completion in 2020: Compare the footprint of antennas compared to "inefficient" old-school dishes

At present moment, it isn’t known where the Australians will build the supercomputer, but given the ideal location for harnessing the solar power, don’t be surprised if the EFLOPS supercomputer ends up having 8000 antenna array on the left and several square kilometers of solar panels on the right.

If you want to do numbers, brace yourself – if you would need to build a 1EFLOPS supercomputer today, you would need quite an interesting number: 666,666 Fermi-based Tesla cards with 3.99 PetaBytes of GDDR5 memory. If you require 1EFLOPS from world’s most powerful CPUs, the clock would stop with Intel Xeon W5590 at 3.33 GHz, and you would need around 10 million of them. In both cases, EFLOPS is unattainable – but GPUs look in prime position to reach "<20,000 chips for 1EFLOPS".

One thing is certain – the more computing power is available, the more power scientists can use. But efficiency is first paramount, especially in the form of diesel-powered supercomputing.