Loops containing control flow are problematic for VLIWs relying on
loop buffers. Full predication increases code size, while partial
predication does not support general if-conversion. Here a compromise
approach is proposed and evaluated using media applications. Compiler
techniques are demonstrated which arrange for 70-99% of fetched
operations to come from a statically managed 256-instruction loop
buffer, allowing instruction fetch power savings and eliminating
branch penalties. Also introduced is a form of predication
specialized to permit if-conversion with one bit in each operation and
to eliminate much of the hardware overhead of a predicate
register-based approach.