In this paper, we present a fast iterative magnetic resonance imaging (MRI) reconstruction algorithm taking advantage of the prevailing GPGPU programming paradigm. In clinical environment, MRI reconstruction is usually performed via fast Fourier transform (FFT). However, imaging artifacts (i.e. signal loss) resulting from susceptibility-induced magnetic field inhomogeneities degrade the quality of reconstructed images. These artifacts must be addressed using accurate modeling of the physics of the system coupled with iterative reconstruction. We have developed a reconstruction algorithm with improved image quality at the expense of computation time and hence an implementation on GPUs achieving significant speedup. In this work, we extend our previous work on GPU implementation by adding several new features. First, we enable Sensitivity Encoding for Fast MRI (SENSE) reconstruction (from data acquired using a multi-receiver coil array) which can reduce the acquisition time. Besides, we have implemented a GPU-based total variation regularization in our SENSE reconstruction framework. In this paper, we describe the different optimizations employed from levels of algorithm, program code structures, and specific architecture performance tuning, featuring both our MRI reconstruction algorithm and GPU hardware specifics. Results show that the current GPU implementation produces accurate image estimates while significantly accelerating the reconstruction.