This paper presents a new mechanism for collecting and deploying
runtime optimized code. The code-collecting component resides in the
instruction retirement stage and lays out hot execution paths to
improve instruction fetch rate as well as enable further code
optimization. The code deployment component uses an extension to the
Branch Target Buffer to migrate execution into the new code without
modifying the original code. No significant delay is added to the
total execution of the program due to these components.
The code collection scheme enables safe runtime optimization along
paths that span function boundaries. This technique provides a better
platform for runtime optimization than trace caches, because the traces
are longer and persist in main memory across context switches.
Additionally, these traces are not as susceptible to transient
behavior because they are restricted to frequently executed code.
Empirical results show that on average this mechanism can achieve
better instruction fetch rates using only 12KB of hardware than a
trace cache requiring 15KB of hardware, while producing long,
persistent traces more suited to optimization.