To efficiently schedule superscalar and superpipelined
processors, it is necessary to move instructions across branches.
This requires increasing the scheduling scope beyond the basic
block. Superblock scheduling, a static scheduling method, is
a variant of trace scheduling that removes the bookkeeping
complexity associated with branches into a trace by removing
these entrances using a method called tail duplication. Once
the scheduling scope is enlarged, there are hazards to moving
an instruction above a conditional branch because the instruction
is normally only executed on one path of the conditional branch.
To allow the compiler to schedule code more aggressively, hardware
support can be provided to prevent such hazards. In this paper
we analyze the architecture support and performance of three
superblock scheduling models.