By exploiting fine grain parallelism, superscalar
processors can potentially increase the performance of
future supercomputers. However, supercomputers typically
have a long access delay to their first level memory which
can severely restrict the performance of superscalar
processors. Compilers attempt to move load instructions
far enough ahead to hide this latency. However,
conventional movement of load instructions is limited by
data dependence analysis. This paper introduces a simple
hardware scheme, referred to as preload register update,
to allow the compiler to move load instructions even in
the presence of inconclusive data dependence analysis
results. Preload register update keeps the load
destination registers coherent when load instructions
are moved past store instructions that reference the same
location. With this addition, superscalar processors can
more effectively tolerate longer data access latencies.