100 PFLOPS: China’s Supercomputer Circumvents U.S. Sales Ban

China's Tianhe-2 supercomputer is world's fastest supercomputer, at 33 PFLOPS demonstrated and 55 PFLOPS theoretical performance.

A year ago, we revealed that the U.S. State Department blocked the further sales of Intel Xeon and Xeon Phi processors to Chinese institutions, most notably the Tianhe-2 supercomputer. The U.S. Administration also blocked the move in which a China-based investment fund would invest in AMD i.e. one of original reasons for Radeon Technologies Group – which is even without the said investment, performing above and beyond its financial capabilities.

The reason to move against Tianhe-2 is complicated yet simple – ever since its debut in June 2013, the Tianhe-2 supercomputer from NUDT (National University for Defense Technologies) sits on top of the World’s 500 fastest computers list. From the looks of it, Tianhe-2 (the name translates to ‘Milky Way’) looks to keep on sitting on top even after we see the launch of U.S. supercomputers Summit and Sierra (IBM + Nvidia), as well as Aurora and Theta (Intel).

With its 32,000 Intel Xeon E5-2692 v2 processors, and 48,000 Intel Xeon Phi 31S1P co-processors, Tianhe-2 delivers a peak performance of fantastic 54.9 PFLOPS, and a sustained performance of 33.86 PFLOPS. What is little known is that Tianhe-2 is not a fully built supercomputer. In fact, Tianhe operated at a 50% capacity, as the original target for the system was 100 PFLOPS peak and 80 PFLOPS sustained.

According to our sources, China did not react in a way the current administration expected. Rather than pressuring with (empty) threats that affect the commerce between the two of world’s largest economies, China invested all the funds intended for Intel and other foreign vendors – into the development of in-house Alpha and ARM superprocessors, which have the potential to beat the traditional x86 architecture. In terms of funds, NUDT planned to buy 32,000 more Xeon processors (this time, based on Haswell-E) and 48,000 more Xeon Phi co-processors. We’ve been hearing that over $500 million was invested in bringing the Chinese silicon from a prototype phase to production-grade level.

The New Tianhe-2: Meet the 100 PFLOPS Supercomputer

At the 2016 Supercomputing Frontiers conference in Singapore, we learned the first details of the fully developed Tianhe-2 supercomputer, scheduled to debut in June 2016 during the 2016 International Supercomputing Conference in Frankfurt, Germany. This system is expected to deliver over 100 PFLOPS peak performance, and keep the crown of the world’s fastest (super)computer.

The new Tianhe-2 represents a hybrid design, featuring two new additions, as the old Xeon Phi cards are being phased out. Phytium Technologies recently delivered their “Mars” processors in the form of PCI Express cards that replaced the Xeon Phi cards, and motherboards to upgrade the system. Given that there are 48,000 add-in boards installed, the new 64-core design enables the system to reach its original performance targets. With the three million new ARM cores inside the Tianhe-2, its estimated Rpeak performance in the Linpack benchmark should exceed 100 PFLOPS.

Should Tianhe-2 reach its full deployment of 32,000 Xeons, 32,000 ShenWei processor, and 96,000 Phytium accelerator cards, we might see an upgrade in the range of 200-300 PFLOPS – if the building can withstand the thermal and power challenges associated with it.

Meet Phytium Mars, a 64-core ARM Superprocessor

Met the world's first 64-core, 64-bit Xiaomi processor

In August 2015, a little known company Phytium Technologies planned to demonstrated “Mars” processors at the HotChips conference in Cupertino, CA. However, its Lead scientist was denied a visa to enter the U.S. and we could not see the physical boards which featured this extremely powerful processor. The slide above shows the base architecture of the initial engineering sample, with the final delivered boards featured significantly higher performance specifications.

Mars processor silicon

While we were not privy to see the final silicon, we known that the performance went up by almost three fold, and that the final production board delivers 1.5 TFLOPS of compute power, most probably in a dual chip arrangement (akin to Tesla K80 and FirePro S9300 x2).

There are several implementations of this processor in Tianhe-2: add-in card that replaces the Xeon Phi, and motherboards featuring upgradable memory, all using very affordable DDR3-1600 memory. Phytium Technology delivered motherboards with multiple processors and up to 256 GB per Mars processor. Typical implementation measns the company achieves a triple 64 – 64-bit ARM core inside a 64-core processor attaches to 64 GB memory using 8-channel memory interface, not the 16-channel as mentioned in slides – that is for onboard (G)DDR memory.

Bottom line is, the sales restriction enabled a small startup to deliver a product which achieves higher performance than the products it was supposed to replace. All in all, a win for NUDT, and a small company that ‘no one ever heard off’. We will see how the market will develop, and is there a space for Phytium Technology on the supercomputing market. Tianhe-2 might be just the beginning.

Also, this is not the only development coming from mainland China. Jiāngnán Computing Lab successfully developed a new multi-core Alpha processor. Considered a sixth generation design, ShenWei Alpha processors achieve more than 1 TFLOPS of compute performance. However, we were not able to confirm what volumes are involved with the new batch of ShenWei processors. What makes them mysterious is the fact that Wikipedia only lists three generations of their Alpha processors, while the scientists are talking about fifth, sixth and seventh generations.

  • Izumi Laryukov

    Why is China pulling so far ahead of the USA so quickly? This has far more worrisome implications than people seem to understand. I can only hope that the USA will successfully develop an exascale or more preferably a general use quantum computer in less than 3 years.

    • Sadly, the problem has more faces. The U.S. administration should protect the U.S. technology. But if the administration decides to protect it, then it should do more than letting MIPS architecture being sold to China, which is more critical than x86 (all nuclear warheads, as well as majority of U.S. arsenal – works on MIPS processors), rejecting visas to people that worked on the x86 and GPU architects (one of fathers of G80 today lives in mainland China)… it is a problem that has multiple faces. But all limitations will do is stipulate innovation. I am glad to see Alpha coming back to the frame after being f***** royally by a semi-legal deal/strategy between DEC, Compaq and HP. HP killed Alpha due to their investment in Intel’s Itanium, and we all know how that ended.

      • Testerty

        If you disallow chips with MIPS or x86 to be sold to China, you are going to kill off Boeing and GM sales to China too (because these machines are ran on computers)….. it is throwing the baby out with the bath water. Either way, it sucks.

  • TheDizz

    LOL American govt being dicks and getting creamed after being dicks. All power to China!


    Après avoir construit TIANHE 2 le Supercalculateur pétaflopique le plus puissant du Monde avec ses 33,86 Pétaflops soient 33,86 millions de milliards de calculs par seconde, la Chine a pris 2 ans d’avance dans la construction de son Supercalculateur Exaflopique qui sera opérationnel à 100 pour cent courant 2020. Pour rappel 1 Exaflop = 1 milliard de milliards de calculs par seconde. TIANHE 3 sera 1000 fois plus puissant que le TERA 100 et sera l’équivalent du TERA SEQUANA le Monstre d’acier du CEA qui sera à puissance égale avec TIANTHE 3 et qui devrait être pour ce dernier dans la lignée du point de vue architectural de TIANHE 2. Qui de la France, des USA, de la Chine, du Japon, de la Russie et de la Hollande sera la première Nation à terminer la construction et les essais fonctionnels de son Supercalculateur Exaflopique, ce sera fort probablement la Chine qui aime se positionner en premier dans le TOP 500 des Supercalculateurs les plus puissants au Monde. Mais les Chinois ne comptent pas en rester la, en 2050, ils construiront un Supercalculateur Zettaflopique qui sera immense et cyclopéen, 1 zettaflop est égale à 1000 milliards de milliards de calculs par seconde. En 2050, 3 Supercalculateurs Zettaflopiques seront construits, 1 pour les USA, 1 pour la Chine et un troisième exemplaire pour les pays du Monde Entier qui contribueront au financement du SZI (Supercalculateur Zettaflopique International). Le Supercalculateur Zettaflopique est l’Optimum Technologique de l’informatique binaire, un Supercalculateur Yottaflopique binaire serait Impossible Financièrement et Impossible Techniquement. Pour construire plus puissant un 10zetta ou 100zetta ou yottaflopique, c’est un Supercalculateur Quantique qu’il faudra bâtir. Rien n’arrête l’homme dans sa volonté de construire plus gros et plus puissant, nous pouvons nous demander jusqu’où la folie de l’homme nous amènera. A ce jour, c’est-à-dire au 16 janvier 2017, nous ne connaissons pas encore le nom des Supercalculateurs de la lignée des Exaflopiques des USA (nombre 2), de la Russie, de la Hollande et du Japon, le secret total semble de mise pour ces 4 dernières nations. Tout comme je l’ai fait pour le TERA SEQUANA la grande fierté des Français et du CEA et pour TIANHE 3, je rédigerais un pavé texte pour chacun des 5 autres Supercalculateurs Exaflopiques. Le présent Pavé de Texte à caractère scientifique va être diffusé sous 20 vidéos YouTube ayant attrait à TIANHE 3 et 100 fois sur GOOGLE. IL sera également publié sur le Journal Facebook de David Mocchetti qui est un Journal Scientifique GRATUIT du type Sciences et Avenir. David mon fils est Autiste donc c’est moi qui rédige son Journal. Donc RDV sur Facebook, sur Google et YouTube.

    Alain Mocchetti
    Ingénieur en Construction Mécanique & en Automatismes
    Diplômé Bac + 5 Universitaire (1985)
    UFR Sciences de Metz