IMPACT

An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems

	Paper of IMPACT - Cited Greater than 150 Times

Publication Year:
	2010

Authors
	Isaac Gelado, Javier Cabezas, Nacho Navarro, John E. Stone, Sanjay J. Patel, Wen-mei Hwu

Published:
	The ACM/IEEE 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'10), Pittsburgh, PA., March 13 - 17, 2010

Abstract:
	Heterogeneous computing combines general purpose CPUs with accelerators to efficiently execute both sequential control-inten- sive and data-parallel phases of applications. Existing program- ming models for heterogeneous computing rely on programmers to explicitly manage data transfers between the CPU system memory and accelerator memory. This paper presents a new programming model for heteroge- neous computing, called Asymmetric Distributed Shared Memory (ADSM), that maintains a shared logical memory space for CPUs to access objects in the accelerator physical memory but not vice versa. The asymmetry allows light-weight implementations that avoid common pitfalls of symmetrical distributed shared memory systems. ADSM allows programmers to assign data objects to per- formance critical methods. When a method is selected for acceler- ator execution, its associated data objects are allocated within the shared logical memory space, which is hosted in the accelerator physical memory and transparently accessible by the methods exe- cuted on CPUs. We argue that ADSM reduces programming efforts for hetero- geneous computing systems and enhances application portability. We present a software implementation of ADSM, called GMAC, on top of CUDA in a GNU/Linux environment. We show that ap- plications written in ADSM and running on top of GMAC achieve performance comparable to their counterparts using programmer- managed data transfers. This paper presents the GMAC system and evaluates different designchoices. We further suggest additional ar- chitectural support that will likely allow GMAC to achieve higher application performance than the current CUDA model.