## Parboil Benchmarks

The Parboil benchmarks are a set of throughput computing applications useful for studying the performance of throughput computing architecture and compilers. The name comes from the culinary term for a partial cooking process, which represents our belief that useful throughput computing benchmarks must be "cooked", or preselected to implement a scalable algorithm with fine-grained parallel tasks. But useful benchmarks for this field cannot be "fully cooked", because the architectures and programming models and supporting tools are evolving rapidly enough that static benchmark codes will lose relevance very quickly.

We have collected benchmarks from throughput computing application researchers in many different scientific and commercial fields including image processing, biomolecular simulation, fluid dynamics, and astronomy. Each benchmark includes several implementations. Some implementations we provide as readable base implementations from which new optimization efforts can begin, and others as examples of the current state-of-the-art targeting specific CPU and GPU architectures. As we continue to optimize these benchmarks for new and existing architectures ourselves, we will also gladly accept new implementations and benchmark contributions from developers to recognize those at the frontier of performance optimization on each architecture.

Finally, by including versions of varying levels of optimization of the same fundamental algorithm, the benchmarks present opportunities to demonstrate tools and architectures that help programmers get the most out of their parallel hardware. Less optimized versions are presented as challenges to the compiler and architecture research communities: to develop the technology that automatically raises the performance of simpler implementations to the performance level of sophisticated programmer-optimized implementations, or demonstrate any other performance or programmability improvements. We hope that these benchmarks will facilitate effective demonstrations of such technology.

## The Parboil technical report (PDF)

John A. Stratton, Christopher Rodrigues, I-Jui Sung, Nady Obeid,vLi-Wen Chang, Nasser Anssari, Geng Daniel Liu, Wen-mei W. Hwu

IMPACT Technical Report, IMPACT-12-01, University of Illinois, at Urbana-Champaign, March 2012## Application

Application

Description

BFS

Breadth-First Search

Computes the shortest-path cost from a single source to every other reachable node in a graph of uniform edge weights by means of a breadth-first search.

CUTCP

Distance-Cutoff Coulombic Potential

Computes the short-range component of Coulombic potential at each grid point over a 3D grid containing point charges representing an explicit-water biomolecular model.

HISTO

Saturating Histogram

Computes a moderately large, 2-D saturating histogram with a maximum bin count of 255. Input datasets represent a silicon wafer validation application in which the input points are distributed in a roughly 2-D Gaussian pattern.

LBM

Lattice-Boltzmann Method Fluid Dynamics

A fluid dynamics simulation of an enclosed, lid-driven cavity, using the Lattice-Boltzmann Method.

MM

Dense Matrix-Matrix Multiply

One of the most widely and intensely studied benchmarks, this application performs a dense matrix multiplication using the standard BLAS format.

MRI-GRIDDING

Magnteic Resonance Imaging - Gridding

Computes a regular grid of data representing an MR scan by weighted interpolation of actual acquired data points. The regular grid can then be converted into an image by an FFT.

MRI-Q

Magnetic Resonance Imaging - Q

Computes a matrix Q, representing the scanner configuration for calibration, used in a 3D magnetic resonance image reconstruction algorithms in non-Cartesian space.

SAD

Sum of Absolute Differences

Sum of absolute differences kernel, used in MPEG video encoders. Based on the full-pixel motion estimation algorithm found in the JM reference H.264 video encoder.

SPMV

Sparse-Matrix Dense-Vector Multiplication

Computes the product of a sparse matrix with a dense vector. The sparse matrix is read from file in coordinate format, converted to JDS format with configurable padding and alignment for different devices.

STENCIL

3-D Stencil Operation

An iterative Jacobi stencil operation on a regular 3-D grid.

TPACF

Two Point Angular Correlation Function

TPACF is used to statistically analyze the spatial distribution of observed astronomical bodies. The algorithm computes a distance between all pairs of input, and generates a histogram summary of the observed distances.

## Features

The Parboil benchmarking infrastructure includes support for compiling, executing, and validating various implementations of specific benchmarks. New implementations are as simple as creating an additional source directory and Makefile within the benchmark's source folder. See the README files for additional information. We currently released the following implementations of each benchmark:

Benchmark base Cuda-base Cuda-fermi Cuda-generic Ocl-base Omp-base cutcp X X X X X mri-q X X X X X mri-grid X X X X X sad X X X X X stencil X X X X X X tpacf X X X X X lbm X X X X X sgemm X X X X X X spmv X X X X X bfs X X X X X histogram X X X X X

## Download

The Parboil Benchmark suite is NOW available, under the Illinois Open Source License agreement. Proceed to the Download Page.