Microprocessors exploit instruction-level parallelism and tolerate
memory-access latencies to achieve high-performance. Out-of-order
microprocessors do this by dynamically scheduling instruction execution,
but require power-hungry hardware structures. This article describes
multipass pipelining, a microarchitectural model that provides an
alternative to out-of-order execution for tolerating memory access
latencies. We call our approach "flea-flicker" multipass pipelining
because it uses two (or more) passes of preexecution or execution to
achieve performance efficacy. Multipass pipelining assumes compile-time
scheduling for lower-power and lower-complexity exploitation of
instruction-level parallelism