Run-Time Optimization Architecture ( PostScript version, PDF version)
Matthew C. Merten
Ph.D. dissertation, Department of Electrical and Computer Engineering, University of Illinois,
Urbana IL, August 2002
Each new generation of wide-issue processors continues to achieve
higher performance by exploiting greater amounts of instruction-level
parallelism than the previous generation. Dynamic techniques such as
out-of-order execution with hardware speculation have proven effective
at increasing instruction throughput, parallelism, and utilization of
processor resources. Run-time optimization techniques promise to
enable an even higher level of performance by applying aggressive
transformations at run-time that optimize across module boundaries,
adapt code regions to changing input patterns, and customize code
sequences for the underlying microarchitecture.
This thesis presents a hardware mechanism for generating and deploying
run-time optimized code. The system exploits program execution
phasing by automatically detecting and optimizing the instruction
sequences that comprise the phase, called a hot spot. The hardware
mechanism can be viewed as a filtering system that resides after the
retirement stage of the processor pipeline, accepts an instruction
execution stream as input, and produces instruction profiles and sets
of linked, optimized traces as output. The code deployment mechanism
uses an extension to the branch prediction mechanism to migrate
execution into the new code without modifying the original code.
These new components do not add delay to the execution of the program
except during short bursts of reoptimization, because they operate in
parallel with native execution. This technique provides a strong
platform for run-time optimization because the hot execution regions
are extracted, optimized, and written to main memory for execution
where they will persist across context switches. The framework is
designed to preserve precise exception handling while applying
optimizations which currently include partial function inlining (even
into shared libraries), code straightening, loop unrolling, peephole
optimizations, and instruction rescheduling with renaming, which are
all concurrently performed with the running application.
[ IMPACT Main Page |
Team Members |
Publications |
Software |
FAQ ]