In magnetic resonance imaging (MRI), non-Cartesian scan trajectories are advantageous in a wide variety of emerging applications. Advanced reconstruction algorithms that operate directly on non-Cartesian scan data using optimality criteria such as least-squares (LS) can produce significantly better images than conventional algorithms that apply a fast Fourier transform
(FFT) after interpolating the scan data onto a Cartesian grid. However, advanced LS reconstructions require significantly more computation than conventional reconstructions based on the FFT. For example, one LS algorithm requires nearly six hours to reconstruct a single three-dimensional image on a modern CPU. Our work demonstrates that this advanced reconstruction can be performed quickly and efficiently on a modern GPU, with the reconstruction of a 64^3 3D image requiring just three minutes, an acceptable latency for key applications.
This paper describes how the reconstruction algorithm leverages the resources of the GeForce 8800 GTX (G80) to achieve over 150 GFLOPS in performance. We find that the combination of tiling the data and storing the data in the G80s constant memory dramatically reduces the algorithms required bandwidth to off-chip memory. The G80s special functional units provide substantial acceleration for the trigonometric computations in the algorithms inner loops. Finally, experiment-driven code transformations increase the reconstructions performance by as much as 60% to 80%.