HyperLink   Automatic execution of single-GPU computations across multiple GPUs
Publication Year:
  Javier Cabezas, Lluis Vilanova, Isaac Gelado, Thomas B. Jablin, Nacho Navarro, Wen-mei Hwu
  Proceedings of the 23rd international conference on Parallel architectures and compilation (PACT '14)

We present AMGE, a programming framework and runtime system to decompose data and GPU kernels and execute them on multiple GPUs concurrently. AMGE exploits the remote memory access capability of recent GPUs to guarantee data accessibility regardless of its physical location, thus allowing AMGE to safely decompose and distribute arrays across GPU memories. AMGE also includes a compiler analysis to detect array access patterns in GPU kernels. The runtime uses this information to automatically choose the best computation and data distribution configuration. Through effective use of GPU caches, AMGE achieves good scalability in spite of the limited interconnect bandwidth between GPUs. Results show 1.95x and 3.73x execution speedups for 2 and 4 GPUs for a wide range of dense computations compared to the original versions on a single GPU.

author = {Cabezas, Javier and Vilanova, Llu\'{\i}s and Gelado, Isaac and Jablin, Thomas B. and Navarro, Nacho and Hwu, Wen-mei},
title = {Automatic Execution of single-GPU Computations Across Multiple GPUs},
booktitle = {Proceedings of the 23rd International Conference on Parallel Architectures and Compilation},
series = {PACT '14},
year = {2014},
isbn = {978-1-4503-2809-8},
location = {Edmonton, AB, Canada},
pages = {467--468},
numpages = {2},
url = {http://doi.acm.org/10.1145/2628071.2628109},
doi = {10.1145/2628071.2628109},
acmid = {2628109},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {multi-gpu programming, numa},