Graphics processing units (GPUs) can provide excellent speedups on some,
but not all, general-purpose workloads. Using a set of computational
GPU kernels as examples, the authors show how to adapt kernels to
utilize the architectural features of a GeForce 8800 GPU and what
finally limits the achievable performance.