This dissertation focuses on efficient generation of custom processors from
high-level language descriptions. Our work exploits compiler-based optimizations
and transformations in tandem with high-level synthesis (HLS) to build
high-performance custom processors. The goal is to offer a common multiplatform
high-abstraction programming interface for heterogeneous compute
systems where the benefits of custom reconfigurable (or fixed) processors can
be exploited by the application developers.
The research presented in this dissertation supports the following thesis: In
an increasingly heterogeneous compute environment it is important to leverage
the compute capabilities of each heterogeneous processor efficiently. In
the case of FPGA and ASIC accelerators this can be achieved through HLSbased
flows that (i) extract parallelism at coarser than basic block granularities,
(ii) leverage common high-level parallel programming languages,
and (iii) employ high-level source-to-source transformations to generate highthroughput
custom processors.
First, we propose a novel HLS flow that extracts instruction level parallelism
beyond the boundary of basic blocks from C code. Subsequently,
we describe FCUDA, an HLS-based framework for mapping fine-grained and
coarse-grained parallelism from parallel CUDA kernels onto spatial parallelism.
FCUDA provides a common programming model for acceleration
on heterogeneous devices (i.e. GPUs and FPGAs). Moreover, the FCUDA
framework balances multilevel granularity parallelism synthesis using effi-
cient techniques that leverage fast and accurate estimation models (i.e. do
not rely on lengthy physical implementation tools). Finally, we describe an
advanced source-to-source transformation framework for throughput-driven
parallelism synthesis (TDPS), which appropriately restructures CUDA kernel
code to maximize throughput on FPGA devices. We have integrated the
TDPS framework into the FCUDA flow to enable automatic performance
ii
porting of CUDA kernels designed for the GPU architecture onto the FPGA
architecture.