HyperLink   Tolerating Cache-Miss Latency With Multipass Pipelines.
Publication Year:
  Ronald D. Barnes, Shane Ryoo, Wen-mei Hwu
  IEEE Micro, Vol. 26, No. 1, January-February 2006

Microprocessors exploit instruction-level parallelism and tolerate memory-access latencies to achieve high-performance. Out-of-order microprocessors do this by dynamically scheduling instruction execution, but require power-hungry hardware structures. This article describes multipass pipelining, a microarchitectural model that provides an alternative to out-of-order execution for tolerating memory access latencies. We call our approach "flea-flicker" multipass pipelining because it uses two (or more) passes of preexecution or execution to achieve performance efficacy. Multipass pipelining assumes compile-time scheduling for lower-power and lower-complexity exploitation of instruction-level parallelism