By exploiting fine grain parallelism, superscalar 
processors can potentially increase the performance of 
future supercomputers.  However, supercomputers typically 
have a long access delay to their first level memory which 
can severely restrict the performance of superscalar 
processors.  Compilers attempt to move load instructions 
far enough ahead to hide this latency.  However, 
conventional movement of load instructions is limited by 
data dependence analysis. This paper introduces a simple 
hardware scheme, referred to as preload register update, 
to allow the compiler to move load instructions even in 
the presence of inconclusive data dependence analysis 
results.  Preload register update keeps the load 
destination registers coherent when load instructions 
are moved past store instructions that reference the same 
location.  With this addition, superscalar processors can 
more effectively tolerate longer data access latencies.