BNL Home

Center for Data-Driven Discovery Projects

High Performance Iterative Tomography Reconstructions on GPU and Intel Xeon Phi Coprocessor 

In this STTR Phase I application, we provided, in a budget-aware fashion, the high performance computing (HPC) solution for three-dimensional (3D) tomographic image reconstruction – a popular but computational intensive approach widely utilized in scientific and industrial areas including Physics, chemistry, biology and medicine. We focused our investigation on iterative methods as the most challenging algorithms in the field. These methods are capable to deal with incomplete data that is a common situation in many applications such as low-dose imaging in clinical radiology and limited angles in synchrotron science. Our innovative work included the customized HPC implementations for two representative tomography reconstruction algorithms on the platforms with a more reasonable cost compared to traditional supercomputers and many-core CPU clusters: NVIDIA GPU and Intel Xeon Phi coprocessor. In addition, we also studied the comprehensive hardware performance analysis as a proof of concept based on a range of criteria including budget plan, implementation difficulty, and data scale.

Two iterative CT reconstruction algorithms PML/OSPML and OS-SIRT were hardware accelerated on both NVIDIA GPU (Tesla K20) and Xeon Phi coprocessor (31S1P). For GPU platform, CUDA 7.5 and Nsight were utilized, while for coprocessor platform, Intel Cilk was utilized. The method, strategy and roadmap of migrating from general CPU platform were established and the performances among them were investigated. For this Phase I study, in order to provide proof of concept, we only released efficient acceleration solutions. Our results indicated that one to two orders of magnitude improvement was achieved for both algorithms, while GPU outperformed about 10% than coprocessor. The implementation was also ported to Tomopy software package, so that it can be seamlessly invoked from Tomopy users. Further optimized performance can be achieved with advanced acceleration techniques, which is open to advanced users only. 

tomographic reconstruction