BNL Home

Computer Science and Matehmatics Projects

Automatic Parallelization and Optimization for Lattice QCD Software using a Source-to-Source Compiler

Lattice QCD simulation

Visualization of lattice QCD simulations.

With new computing architectures becoming available at a fast pace, there is a constant need to adapt the existing application codes to exploit the full potential of the new high performance computing systems. Reengineering the codes manually not only requires significant investments of human power, but may also be error prone, elongating the development cycle. Automatic software parallelization and optimization tools can play a significant role in the adaptation process, potentially speeding up the development cycle and achieving optimal performance for the target architecture.

Using a numerical lattice chromodynamics (LQCD) software suite as an example, this project investigates the applicability and efficiency of high-level source-to-source compilers for high-performance scientific computing. LQCD simulations are dominated by matrix-vector multiplications, and we use the R-Stream compiler, being developed by Reservoir Labs Inc., to parallelize and optimize the matvec kernel in a sequential LQCD code written in C. The output C code consists of loop-level parallelism with OpenMP and optimizations such as tiling, loop unrolling and direct memory access management, and can be further tuned by the user for optimal performance.


Our first target architecture was Intel CPUs. When incorporating SIMD instructions into the code generated by R-Stream, the resulting C code performed as much as 40 times better than the input code.  Ongoing further improvements to the output code are expected to produce even better performance. To aid the user-in-the-loop optimizations, a visualizer has also been developed.


E. Papenhausen, B. Wang, M. H. Langston, M. Baskaran, T. Henretty, T. Izubuchi, A. Johnson, C. Jung, M. Lin, B. Meister,  K. Mueller and R. Lethin, Polyhedral user mapping and assistant visualizer tool for the R-Stream auto-parallelizing compiler, in Proc. VISSOFT, pp. 180–184, IEEE, 2015.

M. Lin, E. Papenhausen, M. H. Langston, B. Meister, M. Baskaran, T. Izubuchi and C. Jung, Optimizing the domain wall fermion Dirac operator using the R-Stream source-to-source compiler, to appear in Proceedings of Science LATTICE2015  (2015) 022. .