HyperLink   Multiple-Pass Pipelining: Enhancing in-order Microarchitectures to Out-Of-Order Performance.
Publication Year:
  Ronald D. Barnes
  PhD thesis, Department of Electrical and Computer Engineering, University of Illinois, Urbana IL, 2005

Out-of-program-order execution has become almost a ubiquitous characteristic of modern processorsb ecauseo f its ability to tolerate variablem emory-instructionl atency. As designsa re becomingi ncreasinglyp ower-consciousth, e cost and complexity of the components of out-of-order execution are becoming problematic. Compilers have generally proven adept at planning useful static instruction-level parallelism, but relying solely on the compiler'si nstruction arrangementh as beens hownt o perform poorly when cache misses occur. This work proposes two multiple-pass pipelining "flea-flicker" microarchitectural technique,s two-passp ipelining and mukipassp ipelining, both of which exploit a static compiler's meticulous scheduling as well as advance execution beyond otherwise stalled instructions without the complexity of true out-of-order execution.

With twopass pipelining, programs execute on two in-order back-end pipelines coupled by a queue. The "advance" pipeline often defers instructions dispatching with unready operands rather than stalling. The "backup" pipeline allows concurrent resolution of instructions deferred by the first pipeline allowing overlapping of useful "adva, nced" execution with miss resolution. Multipass pipelining is based upon a similar concept, but overcomes the shortfalls of two-pass pipelining through simultaneous execution
of architectural and advance instructions on a common pipeline in a simultaneous multithreading-like fashion. These techniques perform similarly to achievable out-oforder designs while comparing favorably in terms of power and complexity. An accompanying compiler technique and instruction marking further enhances the handling of miss latencies and reduces fruitless speculative execution by statically denoting instructions that, when stalled, indicate there is little opportunity for advanced execution.