HyperLink   Performance Implications of Synchronization Support for Parallel FORTRAN Programs.
   
Publication Year:
  1994
Authors
  Sadun Anik, Wen-mei Hwu
   
Published:
  Journal of Parallel and Distributed Computing, Vol. 22, pp. 202-215, 1994
   
Abstract:

This paper studies the performance implications of architectural synchronization support for automatically parallelized numerical programs. As the basis for this work, we analyze the needs for synchronization in automatically parallelized numerical programs. The needs are due to task scheduling, iteration scheduling, barriers, and data dependence handling. We present synchronization algorithms for efficient execution of programs with nested parallel loops. Next, we identify how various hardware synchronization support can be used to satisfy these software synchronization needs. The synchronization primitives studied are test&set, fetch & add and exchange -byte operations. In addition to these, synchronization bus implementation of lock/unlock and fetch &add operations are also considered. Lastly, we ran experiments to quantify the impact of various architectural support on the performance of a bus-based shared memory multiprocessor running automatically parallelized numerical programs. We found that supporting an atomic fetch&add primitive in shared memory is as effective as supporting lock/unlock operations with a synchronization bus. Both achieve substantial performance improvement over the cases where atomic test&set and exchange-byte operations are supported in shared memory.