HyperLink   Beating in-order stalls with "flea-flicker" two-pass pipelining
   
Publication Year:
  2003
Authors
  Ronald D. Barnes, Erik M. Nystrom, John W. Sias, Sanjay J. Patel, Nacho Navarro, Wen-mei Hwu
   
Published:
  Proceeding MICRO 36 Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, 2003
   
Abstract:

Accommodating the uncertain latency of load instructionsis one of the most vexing problems in in-order microarchitecturedesign and compiler development. Compilers cangenerate schedules with a high degree of instruction-levelparallelism but cannot effectively accommodate unanticipatedlatencies; incorporating traditional out-of-order executioninto the microarchitecture hides some of this latencybut redundantly performs work done by the compiler andadds additional pipeline stages. Although effective techniques,such as prefetching and threading, have been proposedto deal with anticipable, long-latency misses, theshorter, more diffuse stalls due to difficult-to-anticipate,first- or second-level misses are less easily hidden on in-orderarchitectures. This paper addresses this problemby proposing a microarchitectural technique, referred toas two-pass pipelining, wherein the program executes ontwo in-order back-end pipelines coupled by a queue. The"advance" pipeline executes instructions greedily, withoutstalling on unanticipated latency dependences (executingindependent instructions while otherwise blocking instructionsare deferred). The "backup" pipeline allows concurrentresolution of instructions that were deferred in theother pipeline, resulting in the absorption of shorter missesand the overlap of longer ones. This paper argues that thisdesign is both achievable and a good use of transistor resourcesand shows results indicating that it can deliver significantspeedups for in-order processor designs.