Parboil Benchmarks
The Parboil benchmarks are a set of throughput computing applications useful for
studying the performance of throughput computing architecture and compilers. The
name comes from the culinary term for a partial cooking process, which represents
our belief that useful throughput computing benchmarks must be "cooked",
or preselected to implement a scalable algorithm with fine-grained parallel tasks.
But useful benchmarks for this field cannot be "fully cooked", because
the architectures and programming models and supporting tools are evolving rapidly
enough that static benchmark codes will lose relevance very quickly.
We have collected benchmarks from throughput computing application researchers in
many different scientific and commercial fields including image processing, biomolecular
simulation, fluid dynamics, and astronomy. Each benchmark includes several implementations.
Some implementations we provide as readable base implementations from which new
optimization efforts can begin, and others as examples of the current state-of-the-art
targeting specific CPU and GPU architectures. As we continue to optimize these benchmarks
for new and existing architectures ourselves, we will also gladly accept new implementations
and benchmark contributions from developers to recognize those at the frontier of
performance optimization on each architecture.
Finally, by including versions of varying levels of optimization of the same fundamental
algorithm, the benchmarks present opportunities to demonstrate tools and architectures
that help programmers get the most out of their parallel hardware. Less optimized
versions are presented as challenges to the compiler and architecture research communities:
to develop the technology that automatically raises the performance of simpler implementations
to the performance level of sophisticated programmer-optimized implementations,
or demonstrate any other performance or programmability improvements. We hope that
these benchmarks will facilitate effective demonstrations of such technology.
John A. Stratton, Christopher Rodrigues, I-Jui Sung, Nady Obeid,vLi-Wen Chang, Nasser
Anssari, Geng Daniel Liu, Wen-mei W. Hwu
IMPACT Technical Report, IMPACT-12-01, University of Illinois, at Urbana-Champaign,
March 2012
Application
Application
|
|
Description
|
BFS
|
|
Breadth-First Search
|
|
Computes the shortest-path cost from a single source to
every other reachable node in a graph of uniform edge weights by means of a breadth-first
search.
|
CUTCP
|
|
Distance-Cutoff Coulombic Potential
|
|
Computes the short-range component of Coulombic potential
at each grid point over a 3D grid containing point charges representing an explicit-water
biomolecular model.
|
HISTO
|
|
Saturating Histogram
|
|
Computes a moderately large, 2-D saturating histogram
with a maximum bin count of 255. Input datasets represent a silicon wafer validation
application in which the input points are distributed in a roughly 2-D Gaussian
pattern.
|
LBM
|
|
Lattice-Boltzmann Method Fluid Dynamics
|
|
A fluid dynamics simulation of an enclosed, lid-driven
cavity, using the Lattice-Boltzmann Method.
|
MM
|
|
Dense Matrix-Matrix Multiply
|
|
One of the most widely and intensely studied benchmarks,
this application performs a dense matrix multiplication using the standard BLAS
format.
|
MRI-GRIDDING
|
|
Magnteic Resonance Imaging - Gridding
|
|
Computes a regular grid of data representing an MR scan
by weighted interpolation of actual acquired data points. The regular grid can then
be converted into an image by an FFT.
|
MRI-Q
|
|
Magnetic Resonance Imaging - Q
|
|
Computes a matrix Q, representing the scanner configuration
for calibration, used in a 3D magnetic resonance image reconstruction algorithms
in non-Cartesian space.
|
SAD
|
|
Sum of Absolute Differences
|
|
Sum of absolute differences kernel, used in MPEG video
encoders. Based on the full-pixel motion estimation algorithm found in the JM reference
H.264 video encoder.
|
SPMV
|
|
Sparse-Matrix Dense-Vector Multiplication
|
|
Computes the product of a sparse matrix with a dense vector.
The sparse matrix is read from file in coordinate format, converted to JDS format
with configurable padding and alignment for different devices.
|
STENCIL
|
|
3-D Stencil Operation
|
|
An iterative Jacobi stencil operation on a regular 3-D
grid.
|
TPACF
|
|
Two Point Angular Correlation Function
|
|
TPACF is used to statistically analyze the spatial distribution
of observed astronomical bodies. The algorithm computes a distance between all pairs
of input, and generates a histogram summary of the observed distances.
|
Features
The Parboil benchmarking infrastructure includes support for compiling, executing,
and validating various implementations of specific benchmarks. New implementations
are as simple as creating an additional source directory and Makefile within the
benchmark's source folder. See the README files for additional information.
We currently released the following implementations of each benchmark:
Benchmark
|
base
|
Cuda-base
|
Cuda-fermi
|
Cuda-generic
|
Ocl-base
|
Omp-base
|
cutcp
|
X
|
X
|
|
X
|
X
|
X
|
mri-q
|
X
|
X
|
|
X
|
X
|
X
|
mri-grid
|
X
|
X
|
|
X
|
X
|
X
|
sad
|
X
|
X
|
|
X
|
X
|
X
|
stencil
|
X
|
X
|
X
|
X
|
X
|
X
|
tpacf
|
X
|
X
|
|
X
|
X
|
X
|
lbm
|
X
|
X
|
|
X
|
X
|
X
|
sgemm
|
X
|
X
|
X
|
X
|
X
|
X
|
spmv
|
X
|
X
|
|
X
|
X
|
X
|
bfs
|
X
|
X
|
|
X
|
X
|
X
|
histogram
|
X
|
X
|
X
|
|
X
|
X
|
Download
The Parboil Benchmark suite is NOW available, under the
Illinois Open Source License agreement. Proceed to the
Download Page.