2022
 

Our Work PyTorch-Direct Upstreamed to AWS Deep Graph Library (March 1, 2022)

The GPU-oriented data communication architecture proposed in our previous works, "Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture" and "PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph Neural Network Training with Irregular Accesses," have been officially accepted by the AWS Deep Graph Library (DGL) and merged.

Our work highlights the use of the zero-copy access capability of NVIDIA GPUs to improve the data access efficiency for large sparse datasets, which is the perfect fit for large-scale graph neural network training (GNN). Our work improved GNN training speed by about 1.5x - 4.2x in various models and Single/Multi-GPU training setups. There are active discussions about further expanding our work in multiple fields of GNN training, so please visit https://github.com/dmlc/dgl for more information!

2020
 

IMPACT Group Members and IBM Collaborators Win IPDPS 2020 Best Paper Award (November 12, 2020)

Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-mei Hwu are winders of the IEEE IPDPS 2020 Best Paper Award. Their paper, entitled XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs", proposes XSP an across-stack profiling design that gives a holistic and hierarchical view of ML model execution. XSP leverages distributed tracing to aggregate and correlate profile data from different sources. XSP introduces a leveled and iterative measurement approach that accurately captures the latencies at all levels of the HW/SW stack in spite of the profiling overhead. IThe authors couple the profiling design with an automated analysis pipeline to systematically analyze 65 state-of-the-art ML models. The paper demonstrates that XSP provides insights which would be difficult to discern otherwise.


SC20 Paper Named a Best Paper & Best Student Paper Finalist (September 4, 2020)


Mert Hidayetoglu's follow-up work of his internship at Argonne National Laboratory is nominated for the best paper and best student paper awards at SC20 of supercomputing conference series. His work is on iterative reconstruction of 3D X-ray tomography at unprecedented scale. Mert's code scales well up to 24,576 V100 GPUs on Summit supercomputer and reconstructs an 11Kx11Kx9K multi-scale mouse brain image under three minutes. The reconstruction reaches 65 PFLOPS sustained single-precision throughput: 34% of Summit's theoretical peak performance.

mouse_brain.png

The technical highlight of Mert's paper is the hierarchical communication strategy that alleviates the communication bottleneck of distributed projection and backprojection computations: a few additional very fast intra-node communications reduces the slow inter-node communication volume by 60%. Upon APS upgrade, Mert's code will be used for production at Aurora - world's first exascale computer - with multi-GPU node architecture.


Petascale XCT: 3D Image Reconstruction with Hierarchical Communications on Multi-GPU Nodes (PDF)
  

IBM-Illinois Team Wins the MIT/Amazon/IEEE Sparse DNN Graph Challenge (August 26, 2020)

The team (Mert Hidayetoglu, Carl Pearson, Vikram Mailthody, Jinjun Xiong, Rakesh Nagi, and Wen-mei Hwu) of IBM-Illinois Center for Cognitive Computing Systems Research (C3SR) won the MIT/Amazon/IEEE Sparse DNN Graph Challenge 2020. The team develops efficient GPU algorithms to use of on-chip memory to save energy and time for unstructured data access in sparse computation. The proposed implementation reduces inference latency by an order of magnitude compared to the 2019 winner. Their paper includes performance benchmarking on 12 sparse deep neural network models with various sizes; and demonstrates an at-scale 180 TeraEdges/Second sustained inference throughput on Summit supercomputer. Thanks to Eiman Ebrahimi of NVIDIA, the paper also involves the first performance benchmarking of the latest-generation Ampere A100 GPU in the literature. They will present their work at HPEC'20 on September.


At-Scale Sparse Deep Neural Network Inference With Efficient GPU Implementation (PDF)
  

SC20 Student Cluster Reproducibility Committee Selects MemXCT: Memory-Centric X-ray CT Reconstruction with Massive Parallelization (April 15, 2020)

The SC20 Reproducibility Committee has selected the SC19 paper MemXCT: Memory-Centric X-ray CT Reconstruction with Massive Parallelization, to serve as the Student Cluster Competition (SCC) benchmark for the Reproducibility Challenge this year. The authors and the Reproducibility Committee have been working to create a reproducible benchmark that builds on the papers results. At SC20, the sixteen SCC teams will be asked to run the benchmark, replicating the findings from the original paper under different settings and with different datasets.


MemXCT: Memory-Centric X-ray CT Reconstruction with Massive Parallelization (PDF)
  
2019
 

Omer Anjun Receives Best Paper Award for GPU Work with 3D Stencils (November 19, 2019)

CSL postdoc Omer Anjum, a member of the IMPACT group led by CSL Professor Wen-mei Hwu, recently wrote a paper on his work on high-order stencils titled An Efficient GPU Implementation Technique for Higher-Order 3D Stencils. The publication, which outlines a method of reusing data inside a GPU to improve bandwidth, received a Best Paper Award at the International Conference on High Performance Computing and Communications (HPCC).


Hwu extends GPU principles in general parallel computing applications (November 7, 2019)

The computations of modern hardware are so complex that it requires multiple processors to parallelize the task that is being performed. According to an article from Built In, Nvidia approached ECE ILLINOIS Professor Wen-mei Hwu, AMD Jerry Sanders Chair of Electrical and Computer Engineering, to help extend their designs with GPUs into general parallel computing applications. 


How Parallel Processing Solves Our Biggest Comoputational Problems (November 7, 2019)

Take all the help you can get.

If parallel computing has a central tenet, that might be it. Some of the crazy-complex computations asked of todays hardware are so demanding that the compute burden must be borne by multiple processors, effectively parallelizing whatever task is being performed. The result? Slashed latencies and turbocharged completion times.

Perhaps the most notable push toward parallelism happened around 2006, when tech hardware powerhouse Nvidia approached Wen-mei Hwu, a professor of electrical and computer engineering at the University of Illinois-Urbana Champaign. Nvidia was designing graphics processing units (GPUs) which, thanks to large numbers of threads and cores, had far higher memory bandwidth than the traditional central processing unit (CPUs) as a way to process huge numbers of pixels.

Student Innovation Award and Honorable Mentions at the IEEE HPEC Graph Challenge (September 25, 2019)

The IMPACT group graph challenge team (Omer Anjum, Carl Pearson, Mohammad Almasri, Sitao Huang, Vikram Mailthody, Zaid Qureshi, Professor Wen-Mei Hwu) and collaborators (Jinjun Xiong of IBM Watson Research, and Professor Rakesh Nagi of Illinois Industrial and Systems Engineering) received a student innovation award (led by Mohammad) and two honorable mentions (led by Carl and Sitao) at IEEE High Performance Extreme Computing 2019!!


Abdul Dakkak will be presenting D4P at the OpenPower Summit (August 19, 2019)

D4P: The Power Platform for Docker Online Container Authoring

The aim of D4P is to enrich the Power container ecosystem by providing both a platform for developers to create docker containers and for Power community to find docker images. Already, we have built and published over 200 docker images that are available in the D4P image catalog. User contribution is key to extending D4P's catalog. D4P is available online https://dockerfile-builder.mybluemix.net and slated to be the hub for the Power
community to create, discover, and use docker images.

IMPACT Group Win HPCC Best Paper Award! (August 10, 2019)


Omer Anjum, Simon Garcia de Gonzalo, Mert Hidayetoglu and Wen-mei Hwu have been awarded the best paper award for their work on "An Efficient GPU Implementation for Higher-Order 3D Stencils". Prof. Hwu presented the paper at the 21st IEEE International Conference on High-Performance Computing and Communications Conference.

Cheng Li and Abdul Dakkak will be presenting the "MLPerf-Bench: Benchmarking Deep Learning Systems" Tutorial at ISCA'19 (June 22, 2019)

Cheng Li and Abdul Dakkak will present MLModelScope again as part of the MLPerf Bench Tutorial at ISCA. This follows a successful packed room presentation of ASPLOS'19 in March.

The goal of the tutorial is to bring experts from the industry and academia together to shed light on the following topics to foster systematic development, reproducible evaluation, and performance analysis of deep learning artifacts. It seeks to address the following questions:
  1. What are the benchmarks that can effectively capture the scope of the ML/DL domain?
  2. Are the existing frameworks sufficient for this purpose?
  3. What are some of the industry-standard evaluation platforms or harnesses?
  4. What are the metrics for carrying out an effective comparative evaluation?

Carl and Simon present at ICPE'19 (April 10, 2019)

Simon Garcia de Gonzalo and Carl Pearson presented two papers at the International Conference on Performance Engineering in Mumbai, India! Simon presented "Collaborative Computing on Heterogeneous CPU-FPGA Architectures Using OpenCL" and Carl presented "Evaluating Characteristics of CUDA Communication Primitives on High-Bandwidth Interconnects."


Carl Pearson, Sitao Huang, Vikram Mailthody, and Mert Hidayetoglu win department awards and fellowships (March 26, 2019)

Four IMPACT students received Electrical Computer Engineering or Computer Science department fellowships or awards!

Carl Pearson: E. A. Reid Fellowship
Sitao Huang: Sundaram Seshu International Student Fellowship
Vikram Sharma:  Joan and Lalit Bahl Fellowship
Mert Hidayetoglu: P. D. Coleman Outstanding Research Award


2018
 

Mert Hidayetoglu Won Second Place of the 2018 IEEE IPDPS Ph.D. Forum

There were 32 participants from  14 countries at the 2018 IEEE IPDPS Ph.D. Forum poster competition. Mert Hidayetoglus poster presentation, entitled "Large and Massively-Parallel Image Reconstruction Accelerated with the Multilevel Fast Multipole Algorithm", won second place by popular vote of the IPDPS attendees.
Congratulations, Mert!

Slides (PDF)
  

Carl Pearson Speaks at Blue Waters Symposium (June 6, 2018)

Carl Pearson gave a presentation at the Blue Waters Symposium titled "Bigger GPUs and Bigger Nodes". The Blue Waters Symposium brings together world leaders in petascale computational science and engineering.


Bigger GPUs and Bigger Nodes (PDF)
  

Mert Hidayetoglu and Carl Pearson Receive 2017-18 Dan Vivoli Endowed Fellowship

This fellowship is given to students interested in the use of a Graphics Processing Unit (GPU) in parallel computing, the use of a GPU in solving high performance computing problems, and the development of applications for the GPU.

Sitao Huang Receives 2017-18 Rambus Computer Engineering Fellowship

Dr. Michael Farmwald donated stock from Rambus, Inc. to the University of Illinois to be used to provide funds for the Department of Electrical and Computer Engineering. The Rambus Computer Engineering Fellowship has been established to recognize a graduate student in the Computer Engineering Area for outstanding research.

Wen-Mei Hwu Receives Rose Award for Teaching Excellence (February 15, 2018)

Wen-Mei Hwu was awarded the Rose Award for Teaching Excellence. This award was established by an anonymous alumnus to recognize good teaching and is intended to foster and reward excellence in undergraduate teaching across the College of Engineering. The recipients of this award are teachers who excel at motivating undergraduate students to learn and to appreciate engineering.

Professor Wen-Mei W Hwu, AMD Jerry Sanders Chair of Electrical and Computer Engineering and CSL researcher, is an internationally renowned researcher with an impressive list of accolades and accomplishments. Even so, some his greatest contributions reside under the umbrella of instruction and mentorship, resulting in being honored with this year's Rose Award for Teaching Excellence. His teaching reputation is widely recognized by the industry. Recruiters from top companies such as HP, Intel, AMD, DEC, and Motorola explicitly ask our students if they have taken the Computer Organization and Design course with Prof. Hwu, a distinct advantage for those who have.




2017
 

IEEE CEDA Distinguised Lecture Series

Wen-Mei Hwu spoke at the IEEE CEDA (Central Illinois chapter) Distinguished Lecture Series. The lecture will take place in the auditorium of the University of Illinois Coordinated Science Lab.

Abstract: "We have been experiencing two very important developments in computing. On one hand, a tremendous amount of resources have been invested into innovative applications such as deep learning and cognitive computing. On the other hand, traditional computer architectures have run out of steam in meeting the demands of these innovative applications. The combination of challenging demands from innovation application and inability of traditional computing systems to satisfy these demands send system designers scrambling. As a result, the boundary between hardware design and software development is quickly diminishing: many software applications have to leverage hardware accelerators and hardware accelerators must behave like software components for popular programming languages. This convergence will likely accelerate in the coming decade and reshape the landscape of EDA tools, programming systems, and system architectures. In this talk, I will present the lessons that we learned from addressing heterogeneous system design challenges as well as some open research questions."

Pamphlet: (PDF)
  

Rebooting the Data Access Hierarchy in Computing Systems (November 9, 2017)

Wen-mei Hwu spoke at the Second IEEE International Conference on Rebooting Computing (in Mclean, Virginia), addressing the subjects of parallel and exascale computing. 

Slides (PDF)
  

Rebooting the Data Access Hierarchy in Computing Systems (November 9, 2017)

Wen-Mei Hwu was invited to speak at the 2nd IEEE International Conference on Rebooting Computing (ICRC 2017) in Mclean, Virginia, about Parallel and Exascale Computing.


GTC DC 2017 (November 2, 2017)

The IBM-Illinois Center for Cognitive Computing Systems Research (C3SR) is developing scalable cognitive solutions that embody both advanced cognitive computing workloads and optimized heterogeneous computing systems for these cognitive workloads. The two streams of research not only complement, but also empower each other, and thus should be carried out together in a tightly integrated fashion. We envision technologies developed at the Center will dramatically accelerate the creation and deployment of large-scale, high-impact cognitive applications (by enabling algorithmic researchers to productively use novel parallel technologies), be instrumental in the industrys transition into the new era of cognitive computing, and benefit next generation's education and learning experience.

Slides (PDF)
  

Keynote at Micro '50, Accelerated Diversity Workshop (October 14, 2017)

Wen-Mei Hwu gave the keynote address at the Micro '50 "Accelerated Diversity Workshop". The title of his talk was "C3SR Cloud Tools and Services for Heterogeneous Cognitive Computing Systems".

Slides (PDF)
  

Keynote at International Symposium on Low Power Electronics and Design 2017 (July 26, 2017)

Wen-mei Hwu delivered keynote address at ISLPED 2017, held in Taipei, Taiwan from July 24th to July 26th, 2017.

PUMPS 2017 (Barcelona, Spain) (June 26, 2017)

In its seventh edition, the Programming and tUning Massively Parallel Systems summer school (PUMPS) offers researchers and graduate students a unique opportunity to improve their skills with cutting-edge techniques and hands-on experience in developing and tuning applications for many-core processors with massively parallel computing resources like GPU accelerators.

The summer school is oriented towards advanced programming and optimizations, and thus previous experience in basic GPU programming will be considered in the selection process. We will also consider the current parallel applications and numerical methods you are familiar with, and the specific optimizations you would like to discuss.

Andy Schuh Speaks at WCAE 2017 (June 24, 2017)

Andy Schuh attended the full day WCAE Workshop in Toronto, Canada and gave a talk introducing NVIDIA's Teaching Kit Program and specifically introduced the Accelerated Computing with GPU Teaching Kit which was co-created by Dr. Wen-Mei Hwu and NVIDIA.

Slides (PDF)
  

CEM 2017 Keynote Address (June 23, 2017)

Wen-mei Hwu delivered a keynote address at CEM 2017 in June, regarding the complexity of problems that can be solved in the computing industry in relation to the size of its computing systems- and how this relation can be harnessed to better optimize both aspects.

Abstract (PDF)
Slides (PDF)
  

Mert Hidayetoglu Selected as CSE Fellow

This year 12 CSE faculty members, representing a broad spectrum of research interests, evaluated 39 CSE Fellow proposals. The proposed research must be interdisciplinary and computationally oriented. In the end, 8 proposals were selected for funding for the 2017-18 school year. CSE Fellows will present their research results at the annual CSE Symposium at the end of the school year.

Abdul Dakkak Presents at GTC 2017 (May 8, 2017)

A major component of many advanced programming courses is an open-ended "end-of-term project" assignment. Delivering and evaluating open-ended parallel programming projects for hundreds or thousands of students brings a need for broad system reconfigurability coupled with challenges of testing and development uniformity, access to esoteric hardware and programming environments, scalability, and security. We present RAI, a secure and extensible system for delivering open-ended programming assignments configured with access to different hardware and software requirements. We describe how the system was used to deliver a programming-competition-style final project in an introductory GPU programming course at the University of Illinois Urbana-Champaign. Abdul Dakkak, Carl Pearson, and Cheng LI authored the paper.

Click the title above to watch the recorded presentation.

Carl Pearson Receives the Mavis Future Faculty Fellowship (MF3)

The Mavis Future Faculty Fellows (MF3) Academy in the College of Engineering was developed to facilitate the training of the next generation of great engineering professors. Engineering at Illinois is internationally recognized for the impact of our research and the strength of our graduate education. The doctoral programs that produce this reputation are primarily research-focused and may not provide students interested in academic careers with the opportunity to gain the knowledge of how to become a highly productive faculty member. To help address this issue, the Office of Engineering Graduate, Professional and Online Programs facilitates the MF3 Academy where fellows participate in a series of workshops, seminars, and activities that cover various aspects of an academic career.

On the Cutting Edge: What's Next Beyond the Horizon? (April 18, 2017)

Wen-Mei Hwu spoke at the US-China Innovation Development Forum.


Izzat El Hajj Receives Dan Vivoli Endowed Fellowship

This fellowship is given to students interested in the use of a Graphics Processing Unit (GPU) in parallel computing, the use of a GPU in solving high performance computing problems, and the development of applications for the GPU. Previous IMPACT recipients include Li-Wen Chang, Xiao-Long Wu, and John Stratton.

Mert Hidayetoglu receives the Professor Kung Chie Yeh Endowed Fellowship.

This $5,000 fellowship was established in memory of Professor Kung Chie Yeh, who devoted his career to teaching and research in wave propagation and upper atmospheric physics and morphology. Recipients must be conducting research in wave propagation and upper atmospheric physics.

Now Available: Programming Massively Parallel Processors 3rd Edition

The much anticipated 3rd edition of Programming Massively Parallel Processors A Hands-on Approach by David Kirk and Wen-mei Hwu is now available through all major book sellers including Amazon.com and Barnes & Noble. Broadly speaking, there three major improvements in the third edition while preserving the most valued features of the first two editions. The improvements are (1) adding new Chapters 9 (histogram), 11 (merge sort), and 12 (graph search) that introduce frequently used parallel algorithm patterns, (2) adding new Chapter 16 on deep learning as an application case study, and (3) adding a chapter to clarify the evolution of advanced features of CUDA.  These additions are designed to further enrich the learning experience of our readers. The book is used by more than 100 institutions worldwide for teaching applied parallel programming using GPUs.

Wen-Mei Speaks at Severo Ochoa seminar at Universitat Politècnica de Catalunya (February 17, 2017)

Wen-Mei Gives Talk at UC Irvine (February 10, 2017)

Innovative Applications and Technology Pivots - A Perfect Storm in Computing

Since early 2000, we have been experiencing two very important developments in computing: a tremendous amount of resources have been invested into innovative applications and industry has been taking a technological path where application performance and power efficiency vary by more than two orders of magnitude depending on their parallelism, heterogeneity, and locality. In this talk, Dr. Hwu presents some research opportunities and challenges that are brought about by this perfect storm.

Slides (PDF)
Video
  
2016
 

Wen-Mei Speaks at SuperComputing 2016


What a great time to be a student in computing!

Wen-Mei gave the Education/Career Keynote at SuperComputing 2016 in Salt Lake City.

"Programming Massively Parallel Processors" Text and GPU Teaching Kit: New 3rd Edition

Wen-Mei gave two talks Introducing the 3rd Edition of "Programming Massively Parallel Processors a Hands-on Approach". This new edition is the result of a collaboration between GPU computing experts and covers the CUDA computing platform, parallel patterns, case studies and other programming models. Brand new chapters cover Deep Learning, graph search, sparse matrix computation, histogram and merge sort.

The tightly-coupled GPU Teaching Kit contains everything needed to teach university courses and labs with GPUs.

Education - Career Keynote (Slides) (PDF)
Programming Massively Parallel Processors (Slides) (PDF)
  

Wen-Mei Hwu on IBM Watson Panel Discussion (October 24, 2016)

Advancing the Scientific Frontiers of Cognitive Systems

Cognitive systems learn from vast amounts of complex, ambiguous information and help us do amazing things, such as treat disease, manage finances, and transform commerce. Underneath these systems, the core fields of science & technology -from artificial intelligence to brain science to computer architecture to cognitive science- are advancing rapidly and achieving breakthroughs not envisioned even a few years ago. IBM Research and its network of scientific partners are pursuing some of the hardest technical problems while creating practical solutions that make a difference to the world.

University of Texas at Austin Invited Talk (August 30, 2016)

Innovative Applications and Technology Pivots - A Perfect Storm in Computing

Since early 2000, we have been experiencing two very important developments in computing: a tremendous amount of resources have been invested into innovative applications and industry has been taking a technological path where application performance and power efficiency vary by more than two orders of magnitude depending on their parallelism, heterogeneity, and locality. In this talk, Dr. Hwu presents some research opportunities and challenges that are brought about by this perfect storm.

Slides (PDF)
  

Teaching Kit Tutorial at XSEDE 2016

Joe Bungo (NVIDIA), Andy Schuh (UIUC), and Carl Pearson (UIUC) will host a hands-on tutorial focused on the Accelerated Computing Teaching Kit. This hands-on tutorial introduces a comprehensive set of academic labs and university teaching material for use in introductory and advanced accelerated computing courses. The tutorial will then take attendees through some of the same introductory and intermediate/ advanced lecture slides and hands-on lab exercises that are part of the curriculum.

PUMPS 2016 (Barcelona, Spain)

In its seventh edition, the Programming and tUning Massively Parallel Systems summer school (PUMPS) offers researchers and graduate students a unique opportunity to improve their skills with cutting-edge techniques and hands-on experience in developing and tuning applications for many-core processors with massively parallel computing resources like GPU accelerators.

The summer school is oriented towards advanced programming and optimizations, and thus previous experience in basic GPU programming will be considered in the selection process. We will also consider the current parallel applications and numerical methods you are familiar with, and the specific optimizations you would like to discuss.

Teach GPU Accelerated Computing: Hands-on Lunch Session with NVIDIA Teaching Kit for Educators Presented by NVIDIA and The University of Illinois (UIUC)

As performance and functionality requirements of interdisciplinary computing applications rise, industry demand for new graduates familiar with accelerated computing with GPUs grows. In the future, many mass-market applications will be what are considered "supercomputing applications" as per today's standards. This hands-on tutorial introduces a comprehensive set of academic labs and university teaching materials for use in introductory and advanced parallel programming courses. The teaching materials start with the basics and focus on programming GPUs, and include advanced topics such as optimization, advanced architectural enhancements, and integration of a variety of programming languages. Lunch will be provided. This session utilizes GPU resources in the cloud, you are required to bring your own laptop. This event is focused on teaching faculty, so student registrations are subject to availability via http://nvidia-gpu-teaching-kit-isc16.eventbrite.com.

Wen-Mei gives ICS Keynote (June 2, 2016)

Innovative Applications and Technology Pivots - A Perfect Storm in Computing

Since early 2000, we have been experiencing two very important developments in computing. One is that a tremendous amount of resources have been invested into innovative applications such as first-principle based models, deep learning and cognitive computing. The other part is that the industry has been taking a technological path where application performance and power efficiency vary by more than two orders of magnitude depending on their parallelism, heterogeneity, and locality. Since then, most of the top supercomputers in the world are heterogeneous parallel computing systems. New standards such as the Heterogeneous Systems Architecture (HSA) are emerging to facilitate software development. Much has been and needs to be learned about of algorithms, languages, compilers and hardware architecture in this movement. What are the applications that continue to drive the technology development? How hard is it to program these systems today? How will we programming these systems in the future? How will innovations in memory devices present further opportunities and challenges? What is the impact on long-term software engineering cost on applications? In this talk, I will present some research opportunities and challenges that are brought about by this perfect storm.

IPDPS - Teaching Kit Tutorial (May 24, 2016)

At IPDPS 16, Dr. Wen-Mei Hwu from The University of Illinois (UIUC) will lead a free hands-on tutorial that introduces the GPU Teaching Kit for Accelerated Computing for use in university courses that can benefit from parallel processing.


Wen-Mei Hwu gives AsHES Keynote in Chicago (May 23, 2016)

Since the introduction of CUDA in 2006, we have made tremendous progress in heterogeneous supercomputing. We have built heterogeneous top-ranked supercomputers. Much has been learned about of algorithms, languages, compilers and hardware architecture in this movement. What is the benefit that science teams are seeing? How hard is it to program these systems today? How will we programming these systems in the future? In this talk, I will go over the lessons learned from educating programmers, migrating Blue Waters applications into GPUs, and developing performance-critical libraries. I will then give a preview of the types of programming systems that will be needed to further reduce the software cost of heterogeneous computing.


Invited Lecture at the University of Pennsylvania Symposium (April 20, 2016)

Parallelism, Heterogeneity, Locality, why bother?
Speaker: Wen-Mei Hwu

Computing systems have become power-limited as the Dennard scaling got off track in early 2000. In response, the industry has taken a path where application performance and power efficiency can vary by more than two orders of magnitude depending on their parallelism, heterogeneity, and locality. Since then, we have built heterogeneous top supercomputers. Most of the top supercomputers in the world are heterogeneous parallel computing systems. We have mass-produced heterogeneous mobile computing devices. New standards such as the Heterogeneous Systems Architecture (HSA) are emerging to facilitate software development. Much has been learned about of algorithms, languages, compilers and hardware architecture in this movement. Why do applications bother to use these systems? How hard is it to program these systems today? How will we programming these systems in the future? How will heterogeneity in memory devices present further opportunities and challenges? What is the impact on long-term software engineering cost on applications? In this talk, I will go over the lessons learned from educating programmers and developing performance-critical libraries. I will then give a preview of the types of programming systems that will be needed to further reduce the software cost of heterogeneous computing.

GTC 2016 - Accelerated Computing Teaching Kit Released (April 5, 2016)

NVIDIA and the University of Illinois released the GPU Teaching Kit at the GPU Technology Conference in San Jose, CA. Joe Bungo introduced the NVIDIA Accelerated Computing Teaching Kit program, Wen-Mei Hwu described the scope and sequence of the curriculum and Abdul Dakkak conducted a live lab using two assignments from the curriculum.


Nacho Navaro Memorial (March 19, 2016)


Wen-Mei and Sabrina Hwu spoke at the
memorial held by the Universitat Politècnica
de Catalunya and the Barcelona
Supercomputing Center.


Wen-Mei Hwu Tours Singapore

Wen-Mei Hwu spoke at SuperComputing Frontiers 2016 (March 15-18), the National University of Singapore, and Nanyang Technological University. In addition, he toured the University of Illinois's Advanced Digital Sciences Center (ADSC) located at the Fusionopolis research facility in Singapore.


2015
 

SC 2015 - Accelerated Computing Teaching Kit Presentation (November 17, 2015)

Teaching Heterogeneous Parallel Programming with GPUs

As performance and functionality requirements of interdisciplinary computing applications rise, industry demand for new graduates familiar with parallel programming with accelerators grows. This session introduces a comprehensive set of academic labs and university teaching material for use in introductory and advanced parallel/multicore programming courses. The educational materials start with the basics and focus on programming GPUs, and include advanced topics such as optimization, advanced architectural enhancements and integration of a variety of programming languages and APIs. These comprehensive teaching materials will be made freely available to university instructors around the globe. Session participants will be provided free lunch, immediate access to the material and a chance to win a free copy of the associated textbook!


Wen-Mei Hwu gives Distinguished Lecture Series Talk, Electrical and Computer Engineering Department, University of California Santa Barbara (UCSB) (October 26, 2015)

What have we learned about programming heterogeneous computing systems?


Wen-Mei Hwu gives Distinguished Lecture Series Talk, Computer Science Department, Wayne State University (WSU) (October 6, 2015)

What have we learned about programming heterogeneous computing systems?


Wen-Mei gives Computational Electromagnetics (CEM'15) Keynote Talk

Transitioning HPC Software to Exascale Heterogeneous Computing


Past IMPACT Paper Awarded Micro Test-of-Time Award

Scott A. Mahlke, David C. Lin, William Y. Chen, Richard E. Hank, Roger A. Bringmann for their Micro (1992) paper entitled Effective Compiler Support For Predicated Execution Using the Hyperblock.

David August (former IMPACT member), et al Awarded CGO Test-of-Time Award

George A. Reis, Jonathan Chang, Neil Vachharajani, Ram Rangan, David I. August for their CGO (2005) paper entitled SWIFT: Software Implemented Fault Tolerance.

Wen-Mei Hwu, et al Awarded Micro Test-of-Time Award for Two Papers

Yale N. Patt, Wen-mei Hwu, Michael C. Shebanow for their Micro-18 (1985) paper entitled HPS, a New Microarchitecture: Rationale and Introduction
Yale N. Patt, Stephen W. Melvin, Wen-Mei Hwu, Michael C. Shebanow for their Micro-18 (1985) paper entitled Critical Issues Regarding HPS, A High Performance Microarchitecture

Application Performance Portability for Heterogeneous Computing (February 19, 2015)

C-FAR
Speaker: Wen-Mei Hwu


All computing systems, from mobile devices to supercomputers, are becoming energy limited. This has motivated the adoption heterogeneous computing to significantly increase energy efficiency. A critical ingredient of pervasive utilization of heterogeneous computing is the ability to run applications on different compute engines without software redevelopment. Our C-FAR work in the past two years has focused on performance portability at two distinct levels. At a higher-level, Tangram supports expression of algorithm hierarchies that allows generic library code to be auto-configured to near-expert-code performance on each target hardware. At the lower-level, MxPA compiles existing OpenCL kernels with locality-centric work scheduling and code generation policies. In this talk, I will present the important dimensions of performance portability, key features of Tangram/MxPA, experimental results, and some comparisons with existing industry solutions.

University of Chicago Distinguished Lecture Series - Rethinking Computer Architecture for Energy Limited Computing (January 22, 2015)

Rethinking Computer Architecture for Energy Limited Computing
Speaker: Wen-Mei Hwu

The rise of rich media in mobile devices and massive analytics in data centers has created new opportunities and challenges for computer architects. On one hand, commercial hardware has been undergoing fast transformation to drastically increase the throughput of processing large amounts of data while keeping the power consumption in check. On the other hand, computer architecture has evolved too slowly to facilitate hardware innovations, software productivity, algorithm advancement and user perceived improvements. In this talk, I will present some major challenges facing the computer architecture research community and some recent advancements in throughput computing. I will argue that we must rethink the scope of computer architecture research as we seek to create growth paths for the computer systems industry.

Slides (PDF)
  
2014
 

Wen-mei Hwu Named Recipient of 2014 IEEE Computer Society B. Ramakrishna Rau Award (October 28, 2014)

University of Illinois at Urbana-Champaign Professor Wen-mei W. Hwu has been named the 2014 recipient of the IEEE Computer Society B. Ramakrishna Rau Award for his work in Instruction-Level Parallelism.

The Rau Award recognizes significant accomplishments in the field of microarchitecture and compiler code generation. Hwu was recognized for contributions to Instruction Level Parallelism technology, including compiler optimization, program representation, microarchitecture, and applications.
 
The award, which comes with a $2,000 honorarium and a certificate, will be given out on Tuesday, 16 December at the ACM/IEEE International Symposium on Microarchitecture (MICRO) in Cambridge, UK. IEEE Computer Society established the award in 2010 in memory of the late Bob Rau, an HP Senior Fellow.

Slides (PDF)
Video
  

Columbia University - Invited Talk: Moving Towards Exascale with Lessons Learned from GPU Computing (October 14, 2014)

The rise of GPU computing has significantly boosted the pace of progress in numeric methods, algorithm design, and programming techniques for developing scalable applications. Much has been learned about of algorithms, languages, compilers and hardware architecture in this movement. I will discuss some insights gained and a vision for moving applications into exascale computing.

University of British Columbia: Invited Distinguish Lecture (October 5, 2014)

The rise of GPU computing has significantly boosted the pace of progress in numeric methods, algorithm design, and programming techniques for developing scalable applications. Much has been learned about of algorithms, languages, compilers and hardware architecture in this movement. I will discuss some insights gained and a vision for moving applications into exascale computing.

Slides (PDF)
  

PCI Distinguished Lecture: Runtime Aware Architectures (September 22, 2014)



You are invited to attend the next talk in the PCI Distinguished Lectures. On Monday, September 22nd  Mateo Valero of the Barcelona Supercomputing Center will be speaking in CSL B02 at 10am on Runtime Aware Architectures.

The traditional ways of increasing hardware performance predicted by Moores Lawhave vanished. When uni-cores were the norm, hardware design was decoupled from the software stack, thanks to a well-defined Instruction Set Architecture. This simple interface allowed developers to design applications without much concern for the hardware, while hardware designers were able to exploit parallelism in superscalar processors. With the irruption of multi-cores and parallel applications, this approach no longer worked. As a result, the role of decoupling applications from the hardware was moved to the runtime system. Efficiently using the underlying hardware from this runtime without exposing its complexities to the application has been the target of research in the last years.

It is our position that the runtime has to drive the design of future multi-cores to overcome the restrictions in terms of power, memory, programmability and
resilience that multi-cores have. In this talk, we introduce an approach towards a Runtime-Aware Architecture, a massively parallel architecture designed from the runtimes perspective.

Mateo Valero is full professor at the Computer Architecture Department at UPC, in Barcelona. His research interests focuses on high performance architectures. He has published approximately 600 papers, has served in the organization of more than 300 International Conferences and he has given more than 400 invited talks. He is the director of the Barcelona Supercomputing Center, the National Center of Supercomputing in Spain.

Flyer (PDF)
  

Celebrating Yale Patt's 75th - Rethinking Computer Architecture (September 19, 2014)

Slides (PDF)
  

NPC 2014 (September 18, 2014)

VIA Technologies (China) Co., Ltd.: Invited Talk (September 13, 2014)

CHANGES 2014 (CHinese-AmericaN-German E-Science and cyberinfrastructure): Invited Talk in Beijing, China (September 11, 2014)

For the past several years, NCSA has been partnering with the Computer Network Information Center (CNIC) within the Chinese Academy of Sciences to hold the American-Chinese Cyberinfrastructure and E-Science workShop (ACCESS) on topics germane to High Performance Computing.  Beginning in 2012, that relationship has grown to include the Juelich Supercomputing Centre in Germany.  This years' workshop called CHANGES 2014 (CHinese-AmericaN-German E-Science and cyberinfrastructure) will be September 10-12 in Beijing, China with the theme Application Scaling and the Use of Accelerators.  I would like to invite you to present at this years' CHANGES Workshop in support of this theme.  NCSA would cover all of your travel expenses to attend this workshop.

Invited Talk at OpenSoC Fabric Workshop: Intentional Programming for Productive Development of Portable, High Efficiency Software in the SoC Heterogeneous Computing Era (August 26, 2014)

In a modern SoC computing system, we have a large variety of computing devices such as CPU cores, GPUs, DSPs, and specialized micro-engines. The OpenCL 2.0 and HSA standards provide low-level programming interfaces that provide some basic level of portability. However, the algorithms are typically baked into the kernel code already. In this talk, I will advocate the use of Intentional Programming to better separate the concerns of what needs to be done and how the work should be done. This would allow better algorithm adjustment and more global optimization for applications. I will show some initial results from the Triolet-Python project.

Slides (PDF)
  

Christopher Rodrigues Doctoral Dissertation: Supporting High-Level, High-Performance Parallel Programming with Library-Driven Optimization

This dissertation presents Triolet, a programming language and compiler for high-level programming of parallel loops for high-performance execution on clusters of multicore computers. Triolets design demonstrates that it is possible to decouple the design of a compiler from the implementation of parallelism without sacrificing performance or ease of use. This programming approach opens the potential for future research into parallel programming frameworks.

DoD ACS Productivity Workshop (July 16, 2014)

Wen-mei Hwu gave a presentation at the DoD ACS Productivity workshop. He first discussed the current status of performance portable programming using OpenCL and MxPA. He then gave a vision for drastic improvement of productivity in performance portable applications using new research breakthroughs from UIUC: Triolet and Tangram.

Slides (PDF)
  

Wen-Mei Hwu Awarded the Collins Award for Innovative Teaching

Hwu earned the Collins Award for Innovative Teaching, an award that recognizes outstanding development or use of new and innovative teaching methods. Hwus major avenue of innovative teaching is his development of a parallel programming course, both in the ECE ILLINOIS classroom and across the world online. His massive open online course on parallel programming was first offered through Coursera in 2012, and a revised version of the class was offered earlier this year.

The course contents have been so popular that it has also been translated into Chinese, Japanese, Russian, and other languages. Considering his pioneering efforts in teaching an increasingly relevant topic, and his commitment to using online platforms to reach a wide pool of students, it is no wonder that he was awarded the Collins Award for Innovative Teaching.

Media TEK Taiwan (Non-Public Talk) (June 30, 2014)

ISCA 2014 Tutorial: Heterogeneous System Architecture (HSA): Architecture and Algorithms Tutorial (June 15, 2014)

Heterogeneous computing is emerging as a requirement for power-efficient system design: modern platforms no longer rely on a single general-purpose processor, but instead benefit from dedicated processors tailored for each task.  Traditionally these specialized processors have been difficult to program due to separate memory spaces, kernel-driver-level interfaces, and specialized programming models.  The Heterogeneous System Architecture (HSA) aims to bridge this gap by providing a common system architecture and a basis for designing higher-level programming models for all devices.  This tutorial will bring in experts from member companies of the HSA Foundation to describe the Heterogeneous Systems Architecture and how it addresses the challenges of modern computing devices.  Additionally, the tutorial will show example applications and use cases that can benefit from the features of HSA.

Programming and Tuning Massively Parallel Systems summer school (PUMPS) (June 7, 2014)


The fifth edition of the Programming and Tuning Massively Parallel Systems summer school (PUMPS is aimed at enriching the skills of researchers, graduate students and teachers with cutting-edge technique and hands-on experience in developing applications for many-core processors with massively parallel computing resources like GPU accelerators. (July 7-11, 2014)

Cornell Departmental Lecture - Scalability, Portability, and Productivity in GPU Computing (April 7, 2014)

The IMPACT group at the University of Illinois has been working on the co-design of scalable algorithms and programming tools for massively threaded heterogeneous computing. A major challenge that we are addressing is to simultaneously achieve scalability, performance, numerical stability, portability, and productivity in GPU computing. In this talk, I will give a brief overview of the NSF Blue Waters petascale heterogeneous parallel computing system at the University of Illinois. I will show experimental results of our achievements to date in applications, libraries, and MxPA. I will then discuss our current work on Tangram and Triolet projects that are aimed to drastically reduce the development and maintenance cost of heterogeneous parallel computing applications.

Slides (PDF)
  

Wen-Mei Hwu Speaks at Michigan Engineering (March 19, 2014)

Scalability, Performance, Stability, and Portability of Many-core Computing Algorithms

The IMPACT group at the University of Illinois has been working on the co-design of scalable algorithms and programming tools for massively threaded computing. A major challenge that we are addressing is to simultaneously achieve scalability, performance, numerical stability, portability, and development cost. In this talk, I will go over the major building blocks involved: memory layout and dynamic tiling. I will show experimental results to demonstrate how these building blocks jointly enable the first scalable, numerically stable tri-diagonal solver that matches the numerical stability of the Intel Math Kernel Library (MKL) and surpass the performance of CUSPARSE. I will then give an overview of the Tangram and Triolet projects that are aimed to drastically improve the quality and reduce the development and maintenance cost future many-core algorithms.

Flyer (PDF)
Slides (PDF)
  

Parboil Benchmark Contributions Receive the SPECtacular Award

The Standard Performance Evaluation Corporation (SPEC) is a non-profit corporation formed to establish, maintain and endorse a standardized set of relevant benchmarks that can be applied to the newest generation of high-performance computers. SPEC develops benchmark suites and also reviews and publishes submitted results from our member organizations and other benchmark licensees.

The IMPACT Research Group and specifically the work of Dr. John Stratton,  was honored at the SPEC annual meeting in January with the SPECtacular award for collaboratively developing the OpenCL benchmark suite which has been adopted as part of the SPEC Accelerator Benchmark Suite (to be released soon). The Parboil benchmarks are a set of applications useful for studying the performance of throughput computing architectures and compilers. The benchmarks come from a variety of scientific and commercial fields including image processing, biomolecular simulation, fluid dynamics, and astronomy.

Wen-Mei Hwu Speaks at University of California, Riverside (March 10, 2014)

The IMPACT group at the University of Illinois has been working on the co-design of scalable algorithms and programming tools for massively threaded computing. A major challenge that we are addressing is to simultaneously achieve scalability, performance, numerical stability, portability, and development cost. In this talk, I will go over the major building blocks involved: memory layout and dynamic tiling. I will show experimental results to demonstrate how these building blocks jointly enable the first scalable, numerically stable tri-diagonal solver that matches the numerical stability of the Intel Math Kernel Library (MKL) and surpass the performance of CUSPARSE. I will then give an overview of the Tangram and Triolet projects that are aimed to drastically improve the quality and reduce the development and maintenance cost future many-core algorithms.

Slides (PDF)
  

ECE UIUC Offers Coursera Heterogeneous Parallel Programming Course (January 6, 2014)

Wen-mei Hwu will be teaching the UIUC-Coursera Heterogeneous Parallel Programming Course in winter 2014.

This course introduces concepts, languages, techniques, and patterns for programming heterogeneous, massively parallel processors. During the previous offering, 25,527 students enrolled, 15,000 students watched videos, 9,908 students participated in quizzes and labs, 2,811 students earned Certificate of Achievement or Certificate of Distinction. The contents and structure of the course have been significantly revised based on the experience gained. It covers heterogeneous computing architectures, data-parallel programming models, techniques for memory bandwidth management, techniques for overlapping communication with computation, and parallel algorithm patterns. The launch date of the course is January 6, 2014.

2013
 

Wen-Mei Hwu Speaks at NVidia GPU Technology Theater at SuperComputing 2013 (November 20, 2013)

Wen-Mei Hwu presents two synergistic systems that enable productive development of scalable, Efficient data parallel code. Triolet is a Python-syntax based functional programming system where library implementers direct the compiler to perform parallelization and deep optimization. Tangram is an algorithm framework that supports effective parallelization of linear recurrence computation.

Slides (PDF)
  

Scalability, portability, and numerical stability in GPU computing (November 14, 2013)

IWCSEThe rise of heterogeneous computing has significantly boosted the pace of progress in numeric methods, algorithm design and programming techniques for developing scalable applications. However, there has been a lack of practical languages and compilers in this movement. In preparation of petascale applications for deployment on Blue Waters, we see critical needs in these areas. I will discuss some recent progress in developing scalable, portable, and numerically stable libraries today and several important research opportunities. I will also review some recent advancement in languages and compilers for developing scalable numerical libraries.stract: The rise of heterogeneous computing has significantly boosted the pace of progress in numeric methods, algorithm design and programming techniques for developing scalable applications. However, there has been a lack of practical languages and compilers in this movement. In preparation of petascale applications for deployment on Blue Waters, we see critical needs in these areas. I will discuss some recent progress in developing scalable, portable, and numerically stable libraries today and several important research opportunities. I will also review some recent advancement in languages and compilers for developing scalable numerical libraries.

Wen-Mei Speaks at Computer Architecture Lab at Carnegie Melon (October 28, 2013)

The prevailing perception about programming heterogeneous parallel computing systems today is that major types of computing devices, such as CPUs, GPUs, and DSPs, each requires a different version of source code to achieve high performance. Even with common data parallel language such as OpenCL, most developers assume that they need to write different versions of source code for different device types. Unfortunately, such code versioning drastically increases development, testing, and maintenance cost of applications. In this talk, I will present two systems, one at the OpenCL level (MxPA) and one at functional programming level (Triolet), that enable single-source development of data parallel code for diverse device types. MxPA is a commercial product today and Triolet is a research prototype. For MxPA, I will show the key compiler transformations that enable a code base that is developed based on GPU performance guidelines to achieve high performance on multicore CPUs with SIMD instructions. I will also show the reasons why many GPU algorithms and CPU algorithms are converging. This brings up an interesting question: does the traditional definition of processor architectures really matter for application developers?

Webpage
  

Wen-Mei Hwu Speaks at Challenges in Genomics and Computing (July 22, 2013)

Wen-Mei Hwu gave a plenary talk, "Applications of Accelerators and other Technologies in Sequencing and their Success" and served as a panelist on the "Computing and Genomics: A Big Data Research Challenge" forum.

Challenges in Genomics and Computing met in Bangalore, India, July 22-24.

Slides (PDF)
  

Wen-mei Hwu gives 2013 Keynote at SAMOS Conference (June 15, 2013)

Wen-mei Hwu presented a keynote lecture, entitled Rethinking Computer Architecture for Throughput Computing, at the SAMOS XIII conference in Vathi, Samos Island, Greece. In the keynote, Wen-mei first presented results and lessons learned from porting major HPC science applications to utilize the 4096 Kepler GPUs in the Blue Waters supercomputer at the University of Illinois. He then discussed major challenges facing throughput computing applications in general . On one hand, commercial computer Hardware organizations have been undergoing fast transformation to drastically increase the throughput of processing large amounts of data while keeping the power consumption in check. On the other hand, computer architecture has evolved too slowly to facilitate hardware innovations, software productivity, algorithm advancement and user perceived improvements. The research community must rethink the scope of computer architecture research as we seek to create growth paths for the computer systems industry.

Keynote Abstract at the 2013 Samos Conference (PDF)
Keynote Slides at the 2013 Samos Conference (PDF)
  

Wen-Mei Hwu Speaks at Cornell University (April 6, 2013)

The IMPACT group at the University of Illinois has been working on the co-design of scalable algorithms and programming tools for massively threaded heterogeneous computing. A major challenge that we are addressing is to simultaneously achieve scalability, performance, numerical stability, portability, and productivity in GPU computing. In this talk, I will give a brief overview of the NSF Blue Waters petascale heterogeneous parallel computing system at the University of Illinois. I will show experimental results of our achievements to date in applications, libraries, and MxPA. I will then discuss our current work on Tangram and Triolet projects that are aimed to drastically reduce the development and maintenance cost of heterogeneous parallel computing applications.

Hwu Highlighted in Russian Periodical

Wen-Mei Hwu was recently interviewed for an article which appeared in Russia's Supercomputers magazine. The current link shows a low-res copy of the original Russian article. We have requested an English translation and will post it when available.

Magazine Cover (JPG)
  

University of Illinois Snags 2013 CCOE Achievement Award

NVidiaResearchers from the University of Illinois at Urbana-Champaign snagged the Second Annual Achievement Award for CUDA Centers of Excellence (CCOE), for their research with Blue Waters. The team was among three other groups of researchers from CCOE institutions, which include some of the worlds top universities, engaged in cutting-edge work with CUDA and GPU computing.

Each of the worlds 21 CCOEs was asked to submit an abstract describing their top achievement in GPU computing over the past year. A panel of experts, led by NVIDIA Chief Scientist Bill Dally, selected four CCOEs to present their achievements at a special event during our annual GPU Technology Conference this week in San Jose. CCOE peers voted for their favorite, who won bragging rights recipient of the second CUDA Achievement Award 2013.

The four finalists each of whom received a GeForce GTX Titan built around the same Kepler chip that powers the worlds fastest supercomputer, the Titan system at the Oak Ridge National Laboratory. The University of Illinois not only won bragging rights and a Titan but also got to take home a Microsoft Surface tablet and special NVIDIA green keyboard.

Achievement Award Submission (PDF)
  

Wen-mei Hwu Keynote at the 2013 HiPEAC Conference in Berlin, Germany (January 21, 2013)

Wen-mei Hwu presented a keynote lecture at the 2013 HiPEAC Conference in Berlin, Germany. Wen-mei made an observation that the rise of heterogeneous computing has significantly boosted the pace of progress in numeric methods, algorithm design and programming techniques for developing scalable applications. However, there has been a lack of practical languages and compilers in this movement. He discussed some recent progress in developing scalable, portable, and numerically stable libraries today and several important research opportunities. He then presented recent advancement and future needs in languages and compilers for developing scalable numerical libraries.

Abstract (PDF)
Slides (PDF)