HyperLink   Data Layout Transformation for Structured-Grid Codes on GPU
Publication Year:
  I-Jui Sung, Wen-mei Hwu
  Workshop on Language, Compiler, and Architecture Support for GPGPU, in conjunction with PPoPP 2010

We present data layout transformation as an effective performance optimization for memory-bound structuredgrid applications for GPUs. Structured grid applications are a class of applications that compute grid cell values on a regular 2D, 3D or higher dimensional regular grid. Each output point is computed as a function of itself and its nearest neighbors. Stencil code is an instance of this application class. Examples of structured grid applications include fluid dynamics and heat distribution that solve partial differential equations with an iterative solver on a dense multidimensional array.
Using the information available through variable-length array syntax, standardized in C99 and other modern languages, we have enabled automatic data layout transformations for structured grid codes with dynamic array sizes. We first present a formulation that enables automatic data layout transformations for structured grid code in CUDA. We then model the
DRAM banking and interleaving scheme of the GTX280 GPU through microbenchmarking. We developed a layout transformation methodology that guides layout transformations to statically choose a good layout given a model of the memory system. The transformation which distributes concurrent memory requests evenly to DRAM channels and banks provides substantial
speedup for structured grid application by improving their memory-level parallelism.