|
|
Advanced Scientific Computing
Highly Scalable FFT and Molecular Dynamics Algorithms for QCDOC
B. Fang, , and Y. Deng , and G. Martyna
The calculation of long-range electrostatic forces has been one of the
biggest challenges in Molecular Dynamics simulation. In parallel
computation, it becomes more severe. In the MDoC package, we adopted the
basic idea of the Smooth Particle Mesh Ewald (SPME) method and
specifically designed and implemented a scalable scheme for QCDOC [1,2].
There are two aspects in this core that we highlight: the parallel
framework for 3D FFT and the employment of a parallel spherical cutoff
FFT in the classic SPME method.
In this SPME package, we specifically designed a 3D complex-to-complex
FFT and real-to-complex FFT to make full use of the unique
six-dimensional torus structure of QCD. The actual running time shows
that this 3D FFT could scale well up to a 4000 node machine on QCDOC,
which is more scalable than the IBM 3D FFT running on BlueGene/L.
The new idea of using a spherical cutoff FFT in the SPME was inspired by
the observation that the approximate error for Euler’s exponential
spline interpolation is not linear, which means that the summation of
inaccurate high order Fourier coefficient components is unnecessary.
This can save considerable communication time when performing SPME on a
parallel machine. The actual running tests show that given a desired
tolerant error, the parallel 3D FFT is reduced as a bottleneck if we
choose an appropriate interpolation order. Figure 1 shows the parallel
efficiency of SPME for the simulation of two biological systems. It
turns out that given a tolerant error (a spherical space cutoff gc),
decreasing the interpolation order n has more effect than
decreasing FFT grid size LFFT in increasing the parallel
efficiency. However, this is only valid if LFFT is of
relatively moderate size.
|
 |
| Figure 1. 3D complex-to-complex FFT efficiency on QCDOC. The
experiment and model performances, measured by the total running
time in solving 323, 643 and 1283
FFTs on different systems, vs. number of nodes. |
 |
| Figure 2. SPME efficiency on QCDOC. Two protein/water systems
were simulated: the β-hairpin
of Streptococcal Protein G (3579 atoms in total) and HIV-1
Protease (29508 atoms in total). Plots (a) and (b) are for
β-hairpin, given two different
tolerance errors (spherical cutoffs gc); (c) and (d) are
for HIV-1. The parameter sets that can provide acceptable
approximation errors are plotted. Briefly speaking, spherical cutoff
FFT is no longer a bottleneck in this parallel implementation, i.e.
if FFT grid size is in relatively moderate size, decreasing
interpolation order can provide much better efficiency. However, the
condition of the above statement cannot be ignored. For instance, in
plot (a) 128 FFT grid size is relatively large and it is the actual
bottleneck instead of the interpolation order 4. However, this is
not the case in plot (c). The same occurs in plot (d), where the 256
FFT grid size is quite large, so that it is the dominant part of the
SPME. |
References
-
[1] Fang, B., Deng, Y., and Martyna, G. Performance of 3D FFT on 6D
QCDOC torus parallel supercomputers. Comp. Phys. Comm. Submitted, 2006.
-
[2] Fang, B., Deng, Y., and Martyna, G. Improving the efficiency and
accuracy of parallel smooth particle mesh Ewald summation for
protein-and-water simulation studies. Comp. Phys. Comm. Submitted, 2006.

Last Modified: April 23, 2009 Please forward all questions about this site to:
Claire Lamberti
|