3D, AMD, Business, Companies, Graphics, Hardware, Software Programs

AMD’s Folding performance explained, future development revealed

Following the article about Top graphics cards for Folding@Home, it seems that I managed to get some doors opened and receive answers  from the people closely involved with the project.
I had that luck of being contacted by people who were or still are involved with the project, and thus their answers were quite interesting. Names will remain unrevealed, of course.;-) In order to keep the clarity of the article, I’ve dumbed down some items that came up in discussions  – I will try to keep it both technical and simple. Impossible task, I know.
Onto the matter then – the reason for ATI’s problems lies in the fact that ATI had a client for several hardware generations. Going back to the beginning, Dr. Vijay Pande (head of the F@H project) and Mike Houston (GPGPU pioneer, now emplyoee of AMD) demonstrated Folding@Home client around two years ago, using ATi Radeon X1900 as a base for demonstration.
The Problem
And here lies the problem with current GPU client – ATI X1K hardware comes off with one big flaw – lack of local memory share between the shader units. As you probably know, Nvidia designed G80 and following GPUs with shaders in groups of 8 units, featuring cache in-between them. According to our sources, that cache issue that stop ATI from achieving greatness, because we heard claims that their VLIW shader arrangement works in “best in class” mode.

The reason for GeForce dominance lies in the purple bar - scratch cache

The reason for GeForce dominance lies in the purple bar - scratch cache

Then again, problem in gaming with X1K and later R600 and RV670 was the relative lack of texture units (TMUs), and problem with GPGPU continued to be – local share. You now might be wondering what will happen if you don’t put that “scratch cache” in the GPU. What happens is that your CPU will be constantly polled, and this drags the performance down to the gutter.
We heard a lot of technical details about that particular issue, and the difference in scaling between dual-core CPU and a quad-core one. All in all, quite interesting stuff. But there is one large point to be made: the reason why Nvidia is so successful with CUDA is the fact that Nvidia offered what companies needed (scratch cache, CUDA, math libraries), while ATI suffers from selecting Brook+ to be their bread and butter until OpenCL comes along.
RV770 saves the day…or not?
The RV770 GPU, more known as Radeon 4800 series is a vast improvement over previous generations. GPGPU-wise, most important thing is introduction of local share, since every 10 shaders got their “slice of the pie”. But GPGPU is more complex field that just “here is the feature, we can all use it now”.
Our sources repeatedly criticized Brook+, claiming it is not in sync with AMD’s own CTI and Stream SDK’s. Brook+ allegedly breaks “with new drivers, with old drivers”, “whatever can go wrong, it can” and so on.
ATI’s hardware now has local share, but that support has to be hard-coded into Brooke+. AMD recently released 1.21 Beta Stream SDK featuring local share, but that same support has to come inside Brooke+ as well.
The Solution: Q1’09
So, we have shown you the problem, and now the time is for the solution. ATI can’t fix the performance issue on previous-gen hardware, but it will solve multitude of issues on Radeon 4800 boards. The team at Stanford is taking some necessary steps to re-do the workflow and introduce local memory share. This could take months, so realistic goal is to have a new client coming in Q1’09.
Once that Radeon 4870 gets fully utilized, those 800 shaders and 70% of theoretical value (700-800 GFLOPS instead of 1-1.2 TFLOPS) should be good enough for reaching the level of GTX280.

Next story update will bring some views and opinions from AMD folk.

  • Boris

    Great Article Mate

  • someguy

    What the heck are you talking about… It’s clear you didn’t actually talk to anyone at Stanford or AMD about the performance issues. At least AMD’s client is stable.

  • Hi “someguy”, I am going to publish article with AMD’s responses and announcements about the future.

    Now, as far as stable client go, I don’t have anything against you, but most of folders in my team use ATI cards, including myself. sorry, I got much more “UNSTABLE_MACHINE” strings on ATI hardware than I got on nVidia one…

    Again, I am using ATI Radeon 4870X2 for folding. Now adding couple of GTX280 boards.

  • Tom

    Any update yet?

    It has been almost two months since this article was posted.

    • Hi Tom,

      the completely new article is being planned for the new site, expect it in a week or so 😉 There is announcement coming on the site on Thursday.

  • Tom

    First of all, sorry to hear about your grandfather.

    Has ATI or Stanford given you an info yet so you can finish your article?

  • Hi Tom, the new site is launching within days time, and ATI Folding article is one of launch articles.

    Not long now 😉

  • waiting…

    tick tock, tick tock

  • Pingback: ATI Catalyst 9.2 - finally Multi-GPU support!!! - Xtreme CPU()

  • Tom

    I give up.

  • noob1

    i love my ATI 4890 card, i can’t waste its powers!! ;_;

    i hope they come out with the OpenCL docs and tutorials. Brook+ is a joke.

  • Useful information. Great blogging. write more! Thank you.

  • Pingback: NVIDIA v.s. ATI in Folding « Estonia Donates()

  • Pingback: Why are ATI Cards bad Folders? - Overclock.net - Overclocking.net()

  • want out of life can be instrumental in setting goals and making dreams come true. Often, reaching them can be as simple as making a decision and taking that

    first s

  • Not sure if anyone here knows this, but in April 2012 folding-at-home systems running Radeon Windows-XP stopped working. It seems that AMD no longer supports OpenCL drivers on Windows-XP so upgrading to version 12 causes you to loose OpenCL. The only option is to switch your card to Nvidia or switch your OS to Linux (not sure people will pay $$$ to upgrade Windows-Vista or Windows-7; it might not even be possible on old hardware)