Advanced Scientific Computing

Highly Scalable FFT and Molecular Dynamics Algorithms for QCDOC
B. Fang, , and Y. Deng , and G. Martyna


The calculation of long-range electrostatic forces has been one of the biggest challenges in Molecular Dynamics simulation. In parallel computation, it becomes more severe. In the MDoC package, we adopted the basic idea of the Smooth Particle Mesh Ewald (SPME) method and specifically designed and implemented a scalable scheme for QCDOC [1,2]. There are two aspects in this core that we highlight: the parallel framework for 3D FFT and the employment of a parallel spherical cutoff FFT in the classic SPME method.

In this SPME package, we specifically designed a 3D complex-to-complex FFT and real-to-complex FFT to make full use of the unique six-dimensional torus structure of QCD. The actual running time shows that this 3D FFT could scale well up to a 4000 node machine on QCDOC, which is more scalable than the IBM 3D FFT running on BlueGene/L.

The new idea of using a spherical cutoff FFT in the SPME was inspired by the observation that the approximate error for Euler’s exponential spline interpolation is not linear, which means that the summation of inaccurate high order Fourier coefficient components is unnecessary. This can save considerable communication time when performing SPME on a parallel machine. The actual running tests show that given a desired tolerant error, the parallel 3D FFT is reduced as a bottleneck if we choose an appropriate interpolation order. Figure 1 shows the parallel efficiency of SPME for the simulation of two biological systems. It turns out that given a tolerant error (a spherical space cutoff gc), decreasing the interpolation order n has more effect than decreasing FFT grid size LFFT in increasing the parallel efficiency. However, this is only valid if LFFT is of relatively moderate size.

Click to enlarge image.

Figure 1. 3D complex-to-complex FFT efficiency on QCDOC. The experiment and model performances, measured by the total running time in solving 323, 643 and 1283 FFTs on different systems, vs. number of nodes.
Click to enlarge image.
Figure 2. SPME efficiency on QCDOC. Two protein/water systems were simulated: the  β-hairpin of Streptococcal Protein G (3579 atoms in total) and HIV-1 Protease (29508 atoms in total). Plots (a) and (b) are for β-hairpin, given two different tolerance errors (spherical cutoffs gc); (c) and (d) are for HIV-1. The parameter sets that can provide acceptable approximation errors are plotted. Briefly speaking, spherical cutoff FFT is no longer a bottleneck in this parallel implementation, i.e. if FFT grid size is in relatively moderate size, decreasing interpolation order can provide much better efficiency. However, the condition of the above statement cannot be ignored. For instance, in plot (a) 128 FFT grid size is relatively large and it is the actual bottleneck instead of the interpolation order 4. However, this is not the case in plot (c). The same occurs in plot (d), where the 256 FFT grid size is quite large, so that it is the dominant part of the SPME.

References

  • [1] Fang, B., Deng, Y., and Martyna, G. Performance of 3D FFT on 6D QCDOC torus parallel supercomputers. Comp. Phys. Comm. Submitted, 2006.
  • [2] Fang, B., Deng, Y., and Martyna, G. Improving the efficiency and accuracy of parallel smooth particle mesh Ewald summation for protein-and-water simulation studies. Comp. Phys. Comm. Submitted, 2006.


 



 

 

Top of Page

Last Modified: January 31, 2008
Please forward all questions about this site to: Claire Lamberti