Accommodating the uncertain latency of load instructionsis one of the
most vexing problems in in-order microarchitecturedesign and compiler
development. Compilers cangenerate schedules with a high degree of
instruction-levelparallelism but cannot effectively accommodate
unanticipatedlatencies; incorporating traditional out-of-order
executioninto the microarchitecture hides some of this latencybut
redundantly performs work done by the compiler andadds additional
pipeline stages. Although effective techniques,such as prefetching and
threading, have been proposedto deal with anticipable, long-latency
misses, theshorter, more diffuse stalls due to
difficult-to-anticipate,first- or second-level misses are less easily
hidden on in-orderarchitectures. This paper addresses this problemby
proposing a microarchitectural technique, referred toas two-pass
pipelining, wherein the program executes ontwo in-order back-end
pipelines coupled by a queue. The"advance" pipeline executes
instructions greedily, withoutstalling on unanticipated latency
dependences (executingindependent instructions while otherwise blocking
instructionsare deferred). The "backup" pipeline allows
concurrentresolution of instructions that were deferred in theother
pipeline, resulting in the absorption of shorter missesand the overlap
of longer ones. This paper argues that thisdesign is both achievable and
a good use of transistor resourcesand shows results indicating that it
can deliver significantspeedups for in-order processor designs.