HyperLink   An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems
Paper of IMPACT - Cited Greater than 150 Times
Publication Year:
  Isaac Gelado, Javier Cabezas, Nacho Navarro, John E. Stone, Sanjay J. Patel, Wen-mei Hwu
  The ACM/IEEE 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'10), Pittsburgh, PA., March 13 - 17, 2010

Heterogeneous computing combines general purpose CPUs with
accelerators to efficiently execute both sequential control-inten-
sive and data-parallel phases of applications. Existing program-
ming models for heterogeneous computing rely on programmers to
explicitly manage data transfers between the CPU system memory
and accelerator memory.
This paper presents a new programming model for heteroge-
neous computing, called Asymmetric Distributed Shared Memory
(ADSM), that maintains a shared logical memory space for CPUs
to access objects in the accelerator physical memory but not vice
versa. The asymmetry allows light-weight implementations that
avoid common pitfalls of symmetrical distributed shared memory
systems. ADSM allows programmers to assign data objects to per-
formance critical methods. When a method is selected for acceler-
ator execution, its associated data objects are allocated within the
shared logical memory space, which is hosted in the accelerator
physical memory and transparently accessible by the methods exe-
cuted on CPUs.
We argue that ADSM reduces programming efforts for hetero-
geneous computing systems and enhances application portability.
We present a software implementation of ADSM, called GMAC,
on top of CUDA in a GNU/Linux environment. We show that ap-
plications written in ADSM and running on top of GMAC achieve
performance comparable to their counterparts using programmer-
managed data transfers. This paper presents the GMAC system and
evaluates different designchoices. We further suggest additional ar-
chitectural support that will likely allow GMAC to achieve higher
application performance than the current CUDA model.