HyperLink   Scalable SIMD-parallel memory allocation for many-core machines
Publication Year:
  Victor Huang, Christopher I. Rodrigues, Stephen Jones, Ian Buck, Wen-mei Hwu
  The Journal of Supercomputing, 9 Sep 2011

Dynamic memory allocation is an important feature of modern programming systems. However, the cost of memory allocation in massively parallel execution environments such as CUDA has been too high for many types of kernels. This paper presents XMalloc, a high-throughput memory allocation mechanism that dramatically magnifies the allocation throughput of an underlying memory allocator. XMalloc embodies two key techniques: allocation coalescing and buffering using efficient queues. This paper describes these two techniques and presents our implementation of XMalloc as a memory allocator library. The library is designed to be called from kernels executed by massive numbers of threads. Our experimental results based on the NVIDIA G480 GPU show that XMalloc magnifies the allocation throughput of the underlying memory allocator by a factor of 48.

@article {springerlink:10.1007/s11227-011-0680-7,
author = {Huang, Xiaohuang and Rodrigues, Christopher and Jones, Stephen and Buck, Ian and Hwu, Wen-mei},
affiliation = {University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA},
title = {Scalable SIMD-parallel memory allocation for many-core machines},
journal = {The Journal of Supercomputing},
publisher = {Springer Netherlands},
issn = {0920-8542},
keyword = {Computer Science},
pages = {1-13},
url = {http://dx.doi.org/10.1007/s11227-011-0680-7},
note = {10.1007/s11227-011-0680-7},