We further accelerate the Illinois Massively Parallel Acceleration Toolkit for Image reconstruction with ENhanced Throughput in MRI (IMPATIENT MRI) package to approach clinically-acceptable times while still taking advantage of a variety of advanced image acquisitions and reconstruction techniques. The improved IMPATIENT implemented a faster Toeplitz-based iterative image reconstruction method, whose computation time is further reduced by an optimally tuned, GPU- accelerated gridding implementation. We demonstrate that the Toeplitz code running on a NVIDIA Tesla C1060 (field-corrected, SENSE) can reduce a one-week long, non-Cartesian 3D 1mm3 high-resolution, whole brain DTI reconstruction (4-channel acquisition) to 4.3 hours. These improvements will enable advances in 3D non- Cartesian sequences, such as cones and stacks of spirals.