HyperLink   MCUDA: An Efficient Implementation of CUDA Kernels on Multi-cores.
Publication Year:
  John A. Stratton, Sam S. Stone, Wen-mei Hwu
  IMPACT Technical Report, IMPACT-08-01, University of Illinois, Urbana, IL, 2008

The CUDA programming model, which is based on an extended ANSI C language and a runtime environment, allows the programmer to specify explicitly data parallel computation. NVIDIA developed CUDA to open the architecture of their graphics accelerators to more general applications, but did not provide an efficient mapping to execute the programming model on any other architecture.

This document describes Multicore-CUDA (MCUDA), a system that efficiently maps the CUDA programming model to a multicore CPU architecture. The major contribution of this work is the source-to-source translation process that converts CUDA code into standard C that interfaces to a runtime library for parallel execution. We apply the MCUDA framework to some CUDA applications previously shown to have high performance on a GPU, and demonstrate high efficiency executing these applications on a multicore CPU architecture. The thread-level parallelism, data locality and computational regularity of the code as expressed in the CUDA model achieve much of the benefit of hand-tuning an application for the CPU architecture. With the MCUDA framework, it is now possible to write data-parallel code in a single programming model for efficient execution on CPU or GPU architectures.