The objective of IMPACT (Illinois Microarchitecture Project using Algorithms and Compiler Technology) is to provide critical research, architecture innovation, and algorithm and compiler prototypes for heterogeneous parallel architectures. We achieve portable performance and energy efficiency for emerging real-world applications by developing novel hardware, compiler, and algorithmic solutions.


Recent & Highlighted Items

SC20 Paper Becomes a Best Paper & Best Student Paper Finalist (September 4, 2020)

Mert Hidayetoglu's follow-up work of his internship at Argonne National Laboratory is nominated for the best paper and best student paper awards at SC20 of supercomputing conference series. His work is on iterative reconstruction of 3D X-ray tomography at unprecedented scale. Mert's code scales well up to 24,576 V100 GPUs on Summit supercomputer and reconstructs an 11Kx11Kx9K multi-scale mouse brain image under three minutes. The reconstruction reaches 65 PFLOPS sustained single-precision throughput: 34% of Summit's theoretical peak performance.

The technical highlight of Mert's paper is the hierarchical communication strategy that alleviates the communication bottleneck of distributed sparse matrix multiplication: a few additional very fast intra-node communications reduces the slow inter-node communication volume by 60%. Upon APS upgrade, Mert's code will be used for production at Aurora - world's first exascale computer - with multi-GPU node architecture.

Petascale XCT: 3D Image Reconstruction with Hierarchical Communications on Multi-GPU Nodes (PDF)

IBM-Illinois Team Wins the MIT/Amazon/IEEE Sparse DNN Graph Challenge (August 26, 2020)

The team (Mert Hidayetoglu, Carl Pearson, Vikram Mailthody, Jinjun Xiong, Rakesh Nagi, and Wen-mei Hwu) of IBM-Illinois Center for Cognitive Computing Systems Research (C3SR) won the MIT/Amazon/IEEE Sparse DNN Graph Challenge 2020. The team develops efficient GPU algorithms to use of on-chip memory to save energy and time for unstructured data access in sparse computation. The proposed implementation reduces inference latency by an order of magnitude compared to the 2019 winner. Their paper includes performance benchmarking on 12 sparse deep neural network models with various sizes; and demonstrates an at-scale 180 TeraEdges/Second sustained inference throughput on Summit supercomputer. Thanks to Eiman Ebrahimi of NVIDIA, the paper also involves the first performance benchmarking of the latest-generation Ampere A100 GPU in the literature. They will present their work at HPEC'20 on September.

At-Scale Sparse Deep Neural Network Inference With Efficient GPU Implementation (PDF)

SC20 Student Cluster Reproducibility Committee Selects MemXCT: Memory-Centric X-ray CT Reconstruction with Massive Parallelization (April 15, 2020)

The SC20 Reproducibility Committee has selected the SC19 paper MemXCT: Memory-Centric X-ray CT Reconstruction with Massive Parallelization, to serve as the Student Cluster Competition (SCC) benchmark for the Reproducibility Challenge this year. The authors and the Reproducibility Committee have been working to create a reproducible benchmark that builds on the papers results. At SC20, the sixteen SCC teams will be asked to run the benchmark, replicating the findings from the original paper under different settings and with different datasets.

MemXCT: Memory-Centric X-ray CT Reconstruction with Massive Parallelization (PDF)

Omer Anjun Receives Best Paper Award for GPU Work with 3D Stencils (November 19, 2019)

CSL postdoc Omer Anjum, a member of the IMPACT group led by CSL Professor Wen-mei Hwu, recently wrote a paper on his work on high-order stencils titled An Efficient GPU Implementation Technique for Higher-Order 3D Stencils. The publication, which outlines a method of reusing data inside a GPU to improve bandwidth, received a Best Paper Award at the International Conference on High Performance Computing and Communications (HPCC).

Hwu extends GPU principles in general parallel computing applications (November 7, 2019)

The computations of modern hardware are so complex that it requires multiple processors to parallelize the task that is being performed. According to an article from Built In, Nvidia approached ECE ILLINOIS Professor Wen-mei Hwu, AMD Jerry Sanders Chair of Electrical and Computer Engineering, to help extend their designs with GPUs into general parallel computing applications. 

How Parallel Processing Solves Our Biggest Comoputational Problems (November 7, 2019)

Take all the help you can get.

If parallel computing has a central tenet, that might be it. Some of the crazy-complex computations asked of todays hardware are so demanding that the compute burden must be borne by multiple processors, effectively parallelizing whatever task is being performed. The result? Slashed latencies and turbocharged completion times.

Perhaps the most notable push toward parallelism happened around 2006, when tech hardware powerhouse Nvidia approached Wen-mei Hwu, a professor of electrical and computer engineering at the University of Illinois-Urbana Champaign. Nvidia was designing graphics processing units (GPUs) which, thanks to large numbers of threads and cores, had far higher memory bandwidth than the traditional central processing unit (CPUs) as a way to process huge numbers of pixels.

Student Innovation Award and Honorable Mentions at the IEEE HPEC Graph Challenge (September 25, 2019)

The IMPACT group graph challenge team (Omer Anjum, Carl Pearson, Mohammad Almasri, Sitao Huang, Vikram Mailthody, Zaid Qureshi, Professor Wen-Mei Hwu) and collaborators (Jinjun Xiong of IBM Watson Research, and Professor Rakesh Nagi of Illinois Industrial and Systems Engineering) received a student innovation award (led by Mohammad) and two honorable mentions (led by Carl and Sitao) at IEEE High Performance Extreme Computing 2019!!

Abdul Dakkak will be presenting D4P at the OpenPower Summit (August 19, 2019)

D4P: The Power Platform for Docker Online Container Authoring

The aim of D4P is to enrich the Power container ecosystem by providing both a platform for developers to create docker containers and for Power community to find docker images. Already, we have built and published over 200 docker images that are available in the D4P image catalog. User contribution is key to extending D4P's catalog. D4P is available online and slated to be the hub for the Power
community to create, discover, and use docker images.

(View Archive of Highlighted Items)