Large-scale GPU clusters are gaining popularity in the scientific
computing community. However, their deployment and production use are
associated with a number of new challenges. In this paper, we present
our efforts to address some of the challenges with building and running
GPU clusters in HPC environments. We touch upon such issues as balanced
cluster architecture, resource sharing in a cluster environment,
programming models, and applications for GPU clusters.