A Japanese supercomputing center has laid claim to the holy grail of computing. They have developed and implemented the worlds first Petaflop computing system. Called MDGRAPE-3. (Molecular Dynamics GRAvity PipE.)I’ll drop some deeper, darker statistical and technical details later. But, the 10 second overview is that the system built at Riken in Yokohama Japan, will break the Petaflop barrier, or million billion floating point operations per second, and may even achieve as much as 1.4 Petaflops, according to one source. To put this in perspective, BlueGene/L, the US’s fastest system, is currently running at 280 Teraflops, or Trillion Floating Point operations per second, which means the fastest US based computer system will come in at about 1/6 the speed of the new Japanese system.

The system is built around the idea of taking standard computing systems and adding specialized extremely fast accelerator chips. These chips will only work well with specific types of codes, but that’s the whole idea for supercomputers, so this isn’t a downside really. In this case the special codes will be things dealing with N-Body problems, (explanation shortly), which is needed for things like galaxy simulation (how our world works), protein folding (picture HIV vaccine), Molecular Dynamics (Mach 5 commuter jets), etc etc. The idea of using specialized accelerator cards is also taking hold here in the US. There are several companies developing specialty accelerator cards, ClearSpeed, being one of them. There is also a lot of research and developement into developing usefull FPGA, or Field Programmable Gate Array accelerators. Cray is doing quite a bit of research into this along with some other new and upcoming companies. So, the US is behind, but at least we’re still in the race.

The new system will be built around 201 systems with 24 MDGRAPE-3 LSI chips. (This supposedly gives 4808 processors, my math says 4824), with another 64 4way Xeon servers (256 procs total) , and 37 2way Xeon servers( 74 procs total). In the scheme of things this it tiny for large HPC systems, very tiny.

Lets put this in perspective. BlueGene/L (US Leader) vs. the new MDGRAPE-3

Number of processors

BlueGene/L 130,000 Processors

MDGRAPE 4808 Processors (1/27 the number of processors)

Total Floating Point Operations per Second (FLOPS)

BlueGene/L 280,000,000,000 per second

MDGRAPE 1,400,000,000,000 per second (6 times faster)

Cost per flop

BlueGene/L $140 per gigaflop

MDGRAPE $15 per gigaflop (1/8 the cost per flop roughly)

Energy Efficiency

BlueGene/L 6 Watts per Gigaflop

MDGRAPE .1 Watt per Gigaflop (1/60th the power consumtion)

Total Cost

BlueGene/L Over $100 Million. (Research cost included with price)

MDGRAPE $9 Million (Lot’s of cost absorbed by NEC/Intel/SGI as chip developement costs)

Summation: Japanese 5, US 0. We’ve been spanked. At least for now.  

Now, what is an N-Body problem. Imagine modeling the evolution of the universe in a computer. To do this right you need to take ALL billion or so stars, (you can cheat and do a rough with only a few million representative stars) get their current position in relation to all the other stars in the universe, calculate the forces between them, then move forward one time step, calculate the forces of all stars in relation to all other stars, move forward one time step, repeat, repeat, etc. This basically means you have to do a calculation of N^2 for each time step. If you use a million stars that means 1,000,000,000,000 calculation per time step for a small representative number of stars. Ouch.

Now take the same idea and picture a few hundred million atoms in a protein, (say for HIV research, cancer, or my favorite, diabetes research, etc). As the protein folds, every atom will effect how all the other atoms in each time step move. So, you once again have an N-Body problem. Now imagine your trying to develope a protein that folds to a specific shape so that if fits the receptor of a cancer cell. You might have to try thousands or millions of different initial states to find the one that ends in the needed final state. Your looking at millions of calculations per trial and millions of trials. And let’s not forget that the bigger the protein the more the atoms the bigger (by a square) the calculations needed. Again, ouch.


There are several US companies working on systems similar to the MDGRAPE-3. I’m looking forward to seeing their response to this.