HyperLink   Effective Cluster Assignment for Modulo Scheduling
Paper of IMPACT - Cited Greater Than 100 Times
Publication Year:
  Erik M. Nystrom, Alexandre E. Eichenberger
  Proceedings of the 31th International Symposium on Microarchitecture, Dec, 1998

Clustering is one solution to the demand for wide-issue machines and fast clock cycles because it allows for smaller, less ported register files and simpler bypass logic while remaining scaleable. Much of the previous work on scheduling for clustered architectures has focused on acyclic code. While minimizing schedule length of acyclic code is paramount, the primary objective when scheduling cyclic code is to maximize the throughput or steady state performance. This paper investigates a pre-modulo scheduling pass that performs cluster assignment in a way that minimizes performance degradation do to explicit communication required as the loops are split over clusters. The cluster assignment algorithm annotates and adjusts the graph for use by the scheduler so that any traditional modulo scheduling algorithm, having no knowledge of clustering, can produce a valid and efficient schedule for a clustered machine.

The algorithm was tested on various machine configurations including ones with general function units and fully specialized function units. The results show that, given a reasonable number of buses, the assignment algorithm allows for a modulo schedule on a clustered machine equal to that of a unified machine for 98% and 95% of loops on general and fully specialized machines, respectively, for both two and four cluster configurations.