HyperLink   Tolerating Data Access Latency with Register Preloading.
Publication Year:
  William Y. Chen, Scott A. Mahlke, Wen-mei Hwu, Tokuzo Kiyohara, Pohua P. Chang
  Proceedings of the 1992 Int'l Conf. on Supercomputing, pp. 104-113, Washington D.C., July, 1992

By exploiting fine grain parallelism, superscalar processors can potentially increase the performance of future supercomputers. However, supercomputers typically have a long access delay to their first level memory which can severely restrict the performance of superscalar processors. Compilers attempt to move load instructions far enough ahead to hide this latency. However, conventional movement of load instructions is limited by data dependence analysis. This paper introduces a simple hardware scheme, referred to as preload register update, to allow the compiler to move load instructions even in the presence of inconclusive data dependence analysis results. Preload register update keeps the load destination registers coherent when load instructions are moved past store instructions that reference the same location. With this addition, superscalar processors can more effectively tolerate longer data access latencies.