Parallel Heisenberg Spin Model on Supercomputer Architectures
M. McGuigan and R. Bennett 

The Heisenberg Spin Model we studied in [1] is used to describe magnetic materials. The model can be extended to a large number of atoms in order to compare with the bulk properties of magnetic materials whose measurement can involve millions or billions of atoms. In the future we will link our study of the Heisenberg model with density functional calculations of its parameters. An important application area is to high-density storage on nanomagnetic materials.

We measured the performance of a parallel Heisenberg spin model using the Monte Carlo method and the Metropolis algorithm on various supercomputer architectures. These architectures include IBM BlueGene/L, PSC Quadrics Cluster, SGI Altix and QCDOC. This assembly of supercomputer systems probes a variety of supercomputer approaches including shared memory, Linux clusters and specialty machines. BlueGene/L, although originally envisioned as a computer for biomolecular simulation, has an efficient implementation of MPI and can be applied to a variety of problems. PSC Quadrics Cluster is a Linux cluster with a quadric interconnect and is also multipurpose. SGI Altix is a shared memory with Numaflex interconnect that can also be applied to distributed problems. QCDOC was built originally as a specialized machine for Lattice QCD but has an efficient message passing library called QMP and can also be applied to a wide range of applications, including computational biology [2] and nanoscience.

The Heisenberg spin model of magnetism is defined by the energy

   (1)

where Si is a three component spin at lattice site i = (i1 ,i2 ,i3 ), the sum over nn in the equation is over nearest neighbor lattice sites, and J is the nearest neighbor coupling. The number of lattice sites or spins along a given direction is given by L. We parallelized the Heisenberg model by using domain decomposition on large lattices up to 16,777,216 atomic grid points. The number of Monte Carlo steps invoked in the Metropolis algorithm is important in reducing the error in the computation so we wanted to study the number of Monte Carlo steps per second that can be achieved on supercomputer architectures. Previous studies of the Parallel Monte Carlo algorithm for the Ising model were performed in [3] where formulas for the number of Monte Carlo steps per second were obtained.

Two of the main properties of parallel computation we studied were strong scaling and weak scaling. Strong scaling means that we fix the problem size, vary the number of processors, and measure the speedup. This is very important for Monte Carlo simulations because one way of reducing the error in a Monte Carlo simulation is to increase the number of Monte Carlo steps. This can be done in a reasonable time frame by increasing the number of Monte Carlo steps per second. Weak scaling means we vary the problem size and the number of processors such that the execution time is the same. In this way we can obtain higher lattice resolution in the same amount of wall clock time. Results from a weak scaling study of QCDOC as a function of the number of processors are shown in Figure 1. Performance on the vertical axis is measured in the number of Monte Carlo steps per second. The results are in excellent agreement with weak scaling.

Click to enlarge image.

Figure 1.  Weak scaling of QCDOC as a function of the number of processors. Performance on the vertical axis is measured in the number of Monte Carlo steps per second.

Using the data from the study we were able to fit a Laurent expansion of the form

(Steps/s)-1  = Time = aL3 / P + bL2 / P2/3 + c                         (2)

where P is the number of processors. One way to understand this formula is that the first term represents the time spent in computation and the other two represent time spent in communication between processors. This form is general enough to include the Ising model parallel performance derived in [3] as well as the Heisenberg model. Note the equation is consistent with weak scaling.

Fitting the data to the above formula we can obtain an estimate of the performance of the various architectures on large number of processors. The result is shown in Figure 2 for the 2563 lattice and for a variety of computer architectures BlueGene/L, PSC Quadrics Cluster, SGI Altix and QCDOC applied to the Heisenberg spin model. In all systems we found excellent parallel performance. For example, our results indicate 60 Monte Carlo Steps per second on a 2563 lattice is possible on a 4096 node BlueGene/L system. This a huge performance gain and will allow the study of large spin systems in the order of a day. On a workstation a similar run could take over a year and make comparison with experiment highly problematic. Our results demonstrate a dramatic increase in productivity in the study of magnetic systems using these leading supercomputer architectures.   

Click to enlarge image.
Figure 2 shows strong scaling performance curves from various supercomputer architectures using measurements and fits to formula (2).

References

  • [1] Bennett, R. and McGuigan, M. Parallel Heisenberg spin model performance on supercomputer architectures. ACM 2006 Symposium on Principles and Practice of Parallel Programming, Sept 2005, submitted.
  • [2] Deng, Y,. Glimm, J., Davenport, J., Cai, X., and Santos, E. Performance models on QCDOC for molecular dynamics with Coulomb potentials. Int. J. High Performance Computing Applications 18(2): 183-198 (2004).
  • [3] Santos, E. and Muthukrishnan, G. Efficient simulation based on sweep selection for 2D and 3D Ising spin models on hierarchical clusters. Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS'04) (2004).

Top of Page
 

Top of Page

Last Modified: January 31, 2008
Please forward all questions about this site to: Claire Lamberti