HyperLink   MCUDA: An Efficient Implementation of CUDA Kernels for Multi-Core CPUs
Paper of IMPACT - Cited Greater Than 250 Times
   
Publication Year:
  2008
Authors
  John A. Stratton, Sam S. Stone, Wen-mei Hwu
   
Published:
  21st International Workshop on Languages and Compilers for Parallel Computing, LNCS 5335, pp. 16-30, 2008
   
Abstract:
CUDA is a data parallel programming model that supports several key abstractions - thread blocks, hierarchical memory and barrier synchronization - for writing applications. This model has proven effective in programming GPUs. In this paper we describe a framework called MCUDA, which allows CUDA programs to be executed efficiently on shared memory, multi-core CPUs. Our framework consists of a set of source-level compiler transformations and a runtime system for parallel execution. Preserving program semantics, the compiler transforms threaded SPMD functions into explicit loops, performs fission to eliminate barrier synchronizations, and converts scalar references to thread-local data to replicated vector references. We describe an implementation of this framework and demonstrate performance approaching that achievable from manually parallelized and optimized C code. With these results, we argue that CUDA can be an effective data-parallel programming model for more than just GPU architectures.
   
BibTeX:
 
@inproceedings{stratton:08:mcuda,
author = "Stratton, John A. and Stone, Sam S. and Hwu, Wen-mei W.",
title = "{MCUDA}: An Effective Implementation of {CUDA} Kernels for Multi-Core {CPUs}",
booktitle= "Proceedings of the 21st International Workshop on
Languages and Compilers for Parallel Computing",
pages = "16--30",
month = jul,
year = "2008"
}