A study of the implementation patterns among massively threaded applications for many-core GPUs reveals that each of the seven most commonly used algorithm and data optimization techniques can enhance the performance of applicable kernels by 2 to 10× in current processors while also improving future scalability. The featured Web extra is a video interview with author John Stratton, who describes how implementation patterns can improve future scalability. (http://youtu.be/fgn9LJbInMw)