HyperLink   Analytical Performance Prediction for Evaluation and Tuning of GPGPU Applications.
Publication Year:
  Sara Sadeghi Baghsorkhi, Matthieu Delahaye, William D. Gropp, Wen-mei Hwu
  In the Workshop on Exploiting Parallelism Using GPUs and Other Hardware-Assisted Methods (associated with CGO '09), March 2009

In this paper we present an analytical model to predict the performance of general purpose applications on a GPU architecture. Themodel is designed to provide performance information to an auto-tuning compiler and assist it narrow the search to the more promising implementations. This work is based on the NVIDIA GPUs using CUDA (ComputeUnified Device Architecture). We analyze each CUDA kernel and generate the corresponding string model which is a concise representation of the operations of a kernel. String model for a kernel summarizes how the kernel exercises major GPU microarchitecture features. Based on the string model we estimate the average execution time of a warp, which is the SIMD work granularity for CUDA. We validated the performance model using a few data parallel benchmarks that exploit differentmicroarchitecture features of GPU architecture. The model captures full system complexity and shows high accuracy in predicting the performance trend of different optimized implementations.We also describe our approach
to extract the performance model automatically.