HyperLink   Parallel Merge for Many-Core Architectures
Publication Year:
  Jie Lv
  Diss. 2016.

This thesis proposes a novel GPU implementation for merging two sorted arrays. We consider the problem of merging two arrays A and B into a single array C. Each element in the arrays has a key. An ordering relation denoted by is defined on the keys. Array A and array B have m and n elements, respectively, where m and n do not have to be equal. Both array A and array B are sorted based on the ordering relation. The task is to produce the output array C of size m + n. Array C consists of all the input elements from array A and array B, and is sorted by the ordering relation. We applied several GPU-specific optimizations to a parallel merge algorithm. The optimizations include coordinating the memory access pattern, making full use of the shared memory and reducing the thread divergence. Our implementation achieves up to 10x and 40x speedup on Titan-Z and GTX 980 GPU respectively compared to thrust merge implementation.