2017
 

Izzat El Hajj Receives Dan Vivoli Endowed Fellowship

This fellowship is given to students interested in the use of a Graphics Processing Unit (GPU) in parallel computing, the use of a GPU in solving high performance computing problems, and the development of applications for the GPU. Previous IMPACT recipients include Li-Wen Chang, Xiao-Long Wu, and John Stratton.

Mert Hidayetoglu receives the Professor Kung Chie Yeh Endowed Fellowship.

This $5,000 fellowship was established in memory of Professor Kung Chie Yeh, who devoted his career to teaching and research in wave propagation and upper atmospheric physics and morphology. Recipients must be conducting research in wave propagation and upper atmospheric physics.

Now Available: Programming Massively Parallel Processors 3rd Edition

The much anticipated 3rd edition of Programming Massively Parallel Processors A Hands-on Approach by David Kirk and Wen-mei Hwu is now available through all major book sellers including Amazon.com and Barnes & Noble. Broadly speaking, there three major improvements in the third edition while preserving the most valued features of the first two editions. The improvements are (1) adding new Chapters 9 (histogram), 11 (merge sort), and 12 (graph search) that introduce frequently used parallel algorithm patterns, (2) adding new Chapter 16 on deep learning as an application case study, and (3) adding a chapter to clarify the evolution of advanced features of CUDA.  These additions are designed to further enrich the learning experience of our readers. The book is used by more than 100 institutions worldwide for teaching applied parallel programming using GPUs.

Wen-Mei Speaks at Severo Ochoa seminar at Universitat Polit├Ęcnica de Catalunya (February 17, 2017)

Wen-Mei Gives Talk at UC Irvine (February 10, 2017)

Innovative Applications and Technology Pivots - A Perfect Storm in Computing

Since early 2000, we have been experiencing two very important developments in computing: a tremendous amount of resources have been invested into innovative applications and industry has been taking a technological path where application performance and power efficiency vary by more than two orders of magnitude depending on their parallelism, heterogeneity, and locality. In this talk, Dr. Hwu presents some research opportunities and challenges that are brought about by this perfect storm.

Slides (PDF)
Video
  
2016
 

Wen-Mei Speaks at SuperComputing 2016


What a great time to be a student in computing!

Wen-Mei gave the Education/Career Keynote at SuperComputing 2016 in Salt Lake City.

"Programming Massively Parallel Processors" Text and GPU Teaching Kit: New 3rd Edition

Wen-Mei gave two talks Introducing the 3rd Edition of "Programming Massively Parallel Processors a Hands-on Approach". This new edition is the result of a collaboration between GPU computing experts and covers the CUDA computing platform, parallel patterns, case studies and other programming models. Brand new chapters cover Deep Learning, graph search, sparse matrix computation, histogram and merge sort.

The tightly-coupled GPU Teaching Kit contains everything needed to teach university courses and labs with GPUs.

Education - Career Keynote (Slides) (PDF)
Programming Massively Parallel Processors (Slides) (PDF)
  

Wen-Mei Hwu on IBM Watson Panel Discussion (October 24, 2016)

Advancing the Scientific Frontiers of Cognitive Systems

Cognitive systems learn from vast amounts of complex, ambiguous information and help us do amazing things, such as treat disease, manage finances, and transform commerce. Underneath these systems, the core fields of science & technology -from artificial intelligence to brain science to computer architecture to cognitive science- are advancing rapidly and achieving breakthroughs not envisioned even a few years ago. IBM Research and its network of scientific partners are pursuing some of the hardest technical problems while creating practical solutions that make a difference to the world.

University of Texas at Austin Invited Talk (August 30, 2016)

Innovative Applications and Technology Pivots - A Perfect Storm in Computing

Since early 2000, we have been experiencing two very important developments in computing: a tremendous amount of resources have been invested into innovative applications and industry has been taking a technological path where application performance and power efficiency vary by more than two orders of magnitude depending on their parallelism, heterogeneity, and locality. In this talk, Dr. Hwu presents some research opportunities and challenges that are brought about by this perfect storm.

Slides (PDF)
  

Teaching Kit Tutorial at XSEDE 2016

Joe Bungo (NVIDIA), Andy Schuh (UIUC), and Carl Pearson (UIUC) will host a hands-on tutorial focused on the Accelerated Computing Teaching Kit. This hands-on tutorial introduces a comprehensive set of academic labs and university teaching material for use in introductory and advanced accelerated computing courses. The tutorial will then take attendees through some of the same introductory and intermediate/ advanced lecture slides and hands-on lab exercises that are part of the curriculum.

PUMPS 2016 (Barcelona, Spain)

In its seventh edition, the Programming and tUning Massively Parallel Systems summer school (PUMPS) offers researchers and graduate students a unique opportunity to improve their skills with cutting-edge techniques and hands-on experience in developing and tuning applications for many-core processors with massively parallel computing resources like GPU accelerators.

The summer school is oriented towards advanced programming and optimizations, and thus previous experience in basic GPU programming will be considered in the selection process. We will also consider the current parallel applications and numerical methods you are familiar with, and the specific optimizations you would like to discuss.

Teach GPU Accelerated Computing: Hands-on Lunch Session with NVIDIA Teaching Kit for Educators Presented by NVIDIA and The University of Illinois (UIUC)

As performance and functionality requirements of interdisciplinary computing applications rise, industry demand for new graduates familiar with accelerated computing with GPUs grows. In the future, many mass-market applications will be what are considered "supercomputing applications" as per today's standards. This hands-on tutorial introduces a comprehensive set of academic labs and university teaching materials for use in introductory and advanced parallel programming courses. The teaching materials start with the basics and focus on programming GPUs, and include advanced topics such as optimization, advanced architectural enhancements, and integration of a variety of programming languages. Lunch will be provided. This session utilizes GPU resources in the cloud, you are required to bring your own laptop. This event is focused on teaching faculty, so student registrations are subject to availability via http://nvidia-gpu-teaching-kit-isc16.eventbrite.com.

Wen-Mei gives ICS Keynote (June 2, 2016)

Innovative Applications and Technology Pivots - A Perfect Storm in Computing

Since early 2000, we have been experiencing two very important developments in computing. One is that a tremendous amount of resources have been invested into innovative applications such as first-principle based models, deep learning and cognitive computing. The other part is that the industry has been taking a technological path where application performance and power efficiency vary by more than two orders of magnitude depending on their parallelism, heterogeneity, and locality. Since then, most of the top supercomputers in the world are heterogeneous parallel computing systems. New standards such as the Heterogeneous Systems Architecture (HSA) are emerging to facilitate software development. Much has been and needs to be learned about of algorithms, languages, compilers and hardware architecture in this movement. What are the applications that continue to drive the technology development? How hard is it to program these systems today? How will we programming these systems in the future? How will innovations in memory devices present further opportunities and challenges? What is the impact on long-term software engineering cost on applications? In this talk, I will present some research opportunities and challenges that are brought about by this perfect storm.

IPDPS - Teaching Kit Tutorial (May 24, 2016)

At IPDPS 16, Dr. Wen-Mei Hwu from The University of Illinois (UIUC) will lead a free hands-on tutorial that introduces the GPU Teaching Kit for Accelerated Computing for use in university courses that can benefit from parallel processing.


Wen-Mei Hwu gives AsHES Keynote in Chicago (May 23, 2016)

Since the introduction of CUDA in 2006, we have made tremendous progress in heterogeneous supercomputing. We have built heterogeneous top-ranked supercomputers. Much has been learned about of algorithms, languages, compilers and hardware architecture in this movement. What is the benefit that science teams are seeing? How hard is it to program these systems today? How will we programming these systems in the future? In this talk, I will go over the lessons learned from educating programmers, migrating Blue Waters applications into GPUs, and developing performance-critical libraries. I will then give a preview of the types of programming systems that will be needed to further reduce the software cost of heterogeneous computing.


Invited Lecture at the University of Pennsylvania Symposium (April 20, 2016)

Parallelism, Heterogeneity, Locality, why bother?
Speaker: Wen-Mei Hwu

Computing systems have become power-limited as the Dennard scaling got off track in early 2000. In response, the industry has taken a path where application performance and power efficiency can vary by more than two orders of magnitude depending on their parallelism, heterogeneity, and locality. Since then, we have built heterogeneous top supercomputers. Most of the top supercomputers in the world are heterogeneous parallel computing systems. We have mass-produced heterogeneous mobile computing devices. New standards such as the Heterogeneous Systems Architecture (HSA) are emerging to facilitate software development. Much has been learned about of algorithms, languages, compilers and hardware architecture in this movement. Why do applications bother to use these systems? How hard is it to program these systems today? How will we programming these systems in the future? How will heterogeneity in memory devices present further opportunities and challenges? What is the impact on long-term software engineering cost on applications? In this talk, I will go over the lessons learned from educating programmers and developing performance-critical libraries. I will then give a preview of the types of programming systems that will be needed to further reduce the software cost of heterogeneous computing.


GTC 2016 - Accelerated Computing Teaching Kit Released (April 5, 2016)

NVIDIA and the University of Illinois released the GPU Teaching Kit at the GPU Technology Conference in San Jose, CA. Joe Bungo introduced the NVIDIA Accelerated Computing Teaching Kit program, Wen-Mei Hwu described the scope and sequence of the curriculum and Abdul Dakkak conducted a live lab using two assignments from the curriculum.


Nacho Navaro Memorial (March 19, 2016)


Wen-Mei and Sabrina Hwu spoke at the
memorial held by the Universitat Polit├Ęcnica
de Catalunya and the Barcelona
Supercomputing Center.


Wen-Mei Hwu Tours Singapore

Wen-Mei Hwu spoke at SuperComputing Frontiers 2016 (March 15-18), the National University of Singapore, and Nanyang Technological University. In addition, he toured the University of Illinois's Advanced Digital Sciences Center (ADSC) located at the Fusionopolis research facility in Singapore.


2015
 

SC 2015 - Accelerated Computing Teaching Kit Presentation (November 17, 2015)

Teaching Heterogeneous Parallel Programming with GPUs

As performance and functionality requirements of interdisciplinary computing applications rise, industry demand for new graduates familiar with parallel programming with accelerators grows. This session introduces a comprehensive set of academic labs and university teaching material for use in introductory and advanced parallel/multicore programming courses. The educational materials start with the basics and focus on programming GPUs, and include advanced topics such as optimization, advanced architectural enhancements and integration of a variety of programming languages and APIs. These comprehensive teaching materials will be made freely available to university instructors around the globe. Session participants will be provided free lunch, immediate access to the material and a chance to win a free copy of the associated textbook!


Wen-Mei Hwu gives Distinguished Lecture Series Talk, Electrical and Computer Engineering Department, University of California Santa Barbara (UCSB) (October 26, 2015)

What have we learned about programming heterogeneous computing systems?


Wen-Mei Hwu gives Distinguished Lecture Series Talk, Computer Science Department, Wayne State University (WSU) (October 6, 2015)

What have we learned about programming heterogeneous computing systems?


Wen-Mei gives Computational Electromagnetics (CEM'15) Keynote Talk

Transitioning HPC Software to Exascale Heterogeneous Computing


Past IMPACT Paper Awarded Micro Test-of-Time Award

Scott A. Mahlke, David C. Lin, William Y. Chen, Richard E. Hank, Roger A. Bringmann for their Micro (1992) paper entitled Effective Compiler Support For Predicated Execution Using the Hyperblock.

David August (former IMPACT member), et al Awarded CGO Test-of-Time Award

George A. Reis, Jonathan Chang, Neil Vachharajani, Ram Rangan, David I. August for their CGO (2005) paper entitled SWIFT: Software Implemented Fault Tolerance.

Wen-Mei Hwu, et al Awarded Micro Test-of-Time Award for Two Papers

Yale N. Patt, Wen-mei Hwu, Michael C. Shebanow for their Micro-18 (1985) paper entitled HPS, a New Microarchitecture: Rationale and Introduction
Yale N. Patt, Stephen W. Melvin, Wen-Mei Hwu, Michael C. Shebanow for their Micro-18 (1985) paper entitled Critical Issues Regarding HPS, A High Performance Microarchitecture

Application Performance Portability for Heterogeneous Computing (February 19, 2015)

C-FAR
Speaker: Wen-Mei Hwu


All computing systems, from mobile devices to supercomputers, are becoming energy limited. This has motivated the adoption heterogeneous computing to significantly increase energy efficiency. A critical ingredient of pervasive utilization of heterogeneous computing is the ability to run applications on different compute engines without software redevelopment. Our C-FAR work in the past two years has focused on performance portability at two distinct levels. At a higher-level, Tangram supports expression of algorithm hierarchies that allows generic library code to be auto-configured to near-expert-code performance on each target hardware. At the lower-level, MxPA compiles existing OpenCL kernels with locality-centric work scheduling and code generation policies. In this talk, I will present the important dimensions of performance portability, key features of Tangram/MxPA, experimental results, and some comparisons with existing industry solutions.

University of Chicago Distinguished Lecture Series - Rethinking Computer Architecture for Energy Limited Computing (January 22, 2015)

Rethinking Computer Architecture for Energy Limited Computing
Speaker: Wen-Mei Hwu

The rise of rich media in mobile devices and massive analytics in data centers has created new opportunities and challenges for computer architects. On one hand, commercial hardware has been undergoing fast transformation to drastically increase the throughput of processing large amounts of data while keeping the power consumption in check. On the other hand, computer architecture has evolved too slowly to facilitate hardware innovations, software productivity, algorithm advancement and user perceived improvements. In this talk, I will present some major challenges facing the computer architecture research community and some recent advancements in throughput computing. I will argue that we must rethink the scope of computer architecture research as we seek to create growth paths for the computer systems industry.

Slides (PDF)
  
2014
 

Wen-mei Hwu Named Recipient of 2014 IEEE Computer Society B. Ramakrishna Rau Award (October 28, 2014)

University of Illinois at Urbana-Champaign Professor Wen-mei W. Hwu has been named the 2014 recipient of the IEEE Computer Society B. Ramakrishna Rau Award for his work in Instruction-Level Parallelism.

The Rau Award recognizes significant accomplishments in the field of microarchitecture and compiler code generation. Hwu was recognized for contributions to Instruction Level Parallelism technology, including compiler optimization, program representation, microarchitecture, and applications.
 
The award, which comes with a $2,000 honorarium and a certificate, will be given out on Tuesday, 16 December at the ACM/IEEE International Symposium on Microarchitecture (MICRO) in Cambridge, UK. IEEE Computer Society established the award in 2010 in memory of the late Bob Rau, an HP Senior Fellow.

Slides (PDF)
Video
  

Columbia University - Invited Talk: Moving Towards Exascale with Lessons Learned from GPU Computing (October 14, 2014)

The rise of GPU computing has significantly boosted the pace of progress in numeric methods, algorithm design, and programming techniques for developing scalable applications. Much has been learned about of algorithms, languages, compilers and hardware architecture in this movement. I will discuss some insights gained and a vision for moving applications into exascale computing.

University of British Columbia: Invited Distinguish Lecture (October 5, 2014)

The rise of GPU computing has significantly boosted the pace of progress in numeric methods, algorithm design, and programming techniques for developing scalable applications. Much has been learned about of algorithms, languages, compilers and hardware architecture in this movement. I will discuss some insights gained and a vision for moving applications into exascale computing.

Slides (PDF)
  

PCI Distinguished Lecture: Runtime Aware Architectures (September 22, 2014)



You are invited to attend the next talk in the PCI Distinguished Lectures. On Monday, September 22nd  Mateo Valero of the Barcelona Supercomputing Center will be speaking in CSL B02 at 10am on Runtime Aware Architectures.

The traditional ways of increasing hardware performance predicted by Moores Lawhave vanished. When uni-cores were the norm, hardware design was decoupled from the software stack, thanks to a well-defined Instruction Set Architecture. This simple interface allowed developers to design applications without much concern for the hardware, while hardware designers were able to exploit parallelism in superscalar processors. With the irruption of multi-cores and parallel applications, this approach no longer worked. As a result, the role of decoupling applications from the hardware was moved to the runtime system. Efficiently using the underlying hardware from this runtime without exposing its complexities to the application has been the target of research in the last years.

It is our position that the runtime has to drive the design of future multi-cores to overcome the restrictions in terms of power, memory, programmability and
resilience that multi-cores have. In this talk, we introduce an approach towards a Runtime-Aware Architecture, a massively parallel architecture designed from the runtimes perspective.

Mateo Valero is full professor at the Computer Architecture Department at UPC, in Barcelona. His research interests focuses on high performance architectures. He has published approximately 600 papers, has served in the organization of more than 300 International Conferences and he has given more than 400 invited talks. He is the director of the Barcelona Supercomputing Center, the National Center of Supercomputing in Spain.

Flyer (PDF)
  

Celebrating Yale Patt's 75th - Rethinking Computer Architecture (September 19, 2014)

Slides (PDF)
  

NPC 2014 (September 18, 2014)

VIA Technologies (China) Co., Ltd.: Invited Talk (September 13, 2014)

CHANGES 2014 (CHinese-AmericaN-German E-Science and cyberinfrastructure): Invited Talk in Beijing, China (September 11, 2014)

For the past several years, NCSA has been partnering with the Computer Network Information Center (CNIC) within the Chinese Academy of Sciences to hold the American-Chinese Cyberinfrastructure and E-Science workShop (ACCESS) on topics germane to High Performance Computing.  Beginning in 2012, that relationship has grown to include the Juelich Supercomputing Centre in Germany.  This years' workshop called CHANGES 2014 (CHinese-AmericaN-German E-Science and cyberinfrastructure) will be September 10-12 in Beijing, China with the theme Application Scaling and the Use of Accelerators.  I would like to invite you to present at this years' CHANGES Workshop in support of this theme.  NCSA would cover all of your travel expenses to attend this workshop.

Invited Talk at OpenSoC Fabric Workshop: Intentional Programming for Productive Development of Portable, High Efficiency Software in the SoC Heterogeneous Computing Era (August 26, 2014)

In a modern SoC computing system, we have a large variety of computing devices such as CPU cores, GPUs, DSPs, and specialized micro-engines. The OpenCL 2.0 and HSA standards provide low-level programming interfaces that provide some basic level of portability. However, the algorithms are typically baked into the kernel code already. In this talk, I will advocate the use of Intentional Programming to better separate the concerns of what needs to be done and how the work should be done. This would allow better algorithm adjustment and more global optimization for applications. I will show some initial results from the Triolet-Python project.

Slides (PDF)
  

Christopher Rodrigues Doctoral Dissertation: Supporting High-Level, High-Performance Parallel Programming with Library-Driven Optimization

This dissertation presents Triolet, a programming language and compiler for high-level programming of parallel loops for high-performance execution on clusters of multicore computers. Triolets design demonstrates that it is possible to decouple the design of a compiler from the implementation of parallelism without sacrificing performance or ease of use. This programming approach opens the potential for future research into parallel programming frameworks.

DoD ACS Productivity Workshop (July 16, 2014)

Wen-mei Hwu gave a presentation at the DoD ACS Productivity workshop. He first discussed the current status of performance portable programming using OpenCL and MxPA. He then gave a vision for drastic improvement of productivity in performance portable applications using new research breakthroughs from UIUC: Triolet and Tangram.

Slides (PDF)
  

Wen-Mei Hwu Awarded the Collins Award for Innovative Teaching

Hwu earned the Collins Award for Innovative Teaching, an award that recognizes outstanding development or use of new and innovative teaching methods. Hwus major avenue of innovative teaching is his development of a parallel programming course, both in the ECE ILLINOIS classroom and across the world online. His massive open online course on parallel programming was first offered through Coursera in 2012, and a revised version of the class was offered earlier this year.

The course contents have been so popular that it has also been translated into Chinese, Japanese, Russian, and other languages. Considering his pioneering efforts in teaching an increasingly relevant topic, and his commitment to using online platforms to reach a wide pool of students, it is no wonder that he was awarded the Collins Award for Innovative Teaching.

Media TEK Taiwan (Non-Public Talk) (June 30, 2014)

ISCA 2014 Tutorial: Heterogeneous System Architecture (HSA): Architecture and Algorithms Tutorial (June 15, 2014)

Heterogeneous computing is emerging as a requirement for power-efficient system design: modern platforms no longer rely on a single general-purpose processor, but instead benefit from dedicated processors tailored for each task.  Traditionally these specialized processors have been difficult to program due to separate memory spaces, kernel-driver-level interfaces, and specialized programming models.  The Heterogeneous System Architecture (HSA) aims to bridge this gap by providing a common system architecture and a basis for designing higher-level programming models for all devices.  This tutorial will bring in experts from member companies of the HSA Foundation to describe the Heterogeneous Systems Architecture and how it addresses the challenges of modern computing devices.  Additionally, the tutorial will show example applications and use cases that can benefit from the features of HSA.

Programming and Tuning Massively Parallel Systems summer school (PUMPS) (June 7, 2014)


The fifth edition of the Programming and Tuning Massively Parallel Systems summer school (PUMPS is aimed at enriching the skills of researchers, graduate students and teachers with cutting-edge technique and hands-on experience in developing applications for many-core processors with massively parallel computing resources like GPU accelerators. (July 7-11, 2014)

Cornell Departmental Lecture - Scalability, Portability, and Productivity in GPU Computing (April 7, 2014)

The IMPACT group at the University of Illinois has been working on the co-design of scalable algorithms and programming tools for massively threaded heterogeneous computing. A major challenge that we are addressing is to simultaneously achieve scalability, performance, numerical stability, portability, and productivity in GPU computing. In this talk, I will give a brief overview of the NSF Blue Waters petascale heterogeneous parallel computing system at the University of Illinois. I will show experimental results of our achievements to date in applications, libraries, and MxPA. I will then discuss our current work on Tangram and Triolet projects that are aimed to drastically reduce the development and maintenance cost of heterogeneous parallel computing applications.

Slides (PDF)
  

Wen-Mei Hwu Speaks at Michigan Engineering (March 19, 2014)

Scalability, Performance, Stability, and Portability of Many-core Computing Algorithms

The IMPACT group at the University of Illinois has been working on the co-design of scalable algorithms and programming tools for massively threaded computing. A major challenge that we are addressing is to simultaneously achieve scalability, performance, numerical stability, portability, and development cost. In this talk, I will go over the major building blocks involved: memory layout and dynamic tiling. I will show experimental results to demonstrate how these building blocks jointly enable the first scalable, numerically stable tri-diagonal solver that matches the numerical stability of the Intel Math Kernel Library (MKL) and surpass the performance of CUSPARSE. I will then give an overview of the Tangram and Triolet projects that are aimed to drastically improve the quality and reduce the development and maintenance cost future many-core algorithms.

Flyer (PDF)
Slides (PDF)
  

Parboil Benchmark Contributions Receive the SPECtacular Award

The Standard Performance Evaluation Corporation (SPEC) is a non-profit corporation formed to establish, maintain and endorse a standardized set of relevant benchmarks that can be applied to the newest generation of high-performance computers. SPEC develops benchmark suites and also reviews and publishes submitted results from our member organizations and other benchmark licensees.

The IMPACT Research Group and specifically the work of Dr. John Stratton,  was honored at the SPEC annual meeting in January with the SPECtacular award for collaboratively developing the OpenCL benchmark suite which has been adopted as part of the SPEC Accelerator Benchmark Suite (to be released soon). The Parboil benchmarks are a set of applications useful for studying the performance of throughput computing architectures and compilers. The benchmarks come from a variety of scientific and commercial fields including image processing, biomolecular simulation, fluid dynamics, and astronomy.

Wen-Mei Hwu Speaks at University of California, Riverside (March 10, 2014)

The IMPACT group at the University of Illinois has been working on the co-design of scalable algorithms and programming tools for massively threaded computing. A major challenge that we are addressing is to simultaneously achieve scalability, performance, numerical stability, portability, and development cost. In this talk, I will go over the major building blocks involved: memory layout and dynamic tiling. I will show experimental results to demonstrate how these building blocks jointly enable the first scalable, numerically stable tri-diagonal solver that matches the numerical stability of the Intel Math Kernel Library (MKL) and surpass the performance of CUSPARSE. I will then give an overview of the Tangram and Triolet projects that are aimed to drastically improve the quality and reduce the development and maintenance cost future many-core algorithms.

Slides (PDF)
  

ECE UIUC Offers Coursera Heterogeneous Parallel Programming Course (January 6, 2014)

Wen-mei Hwu will be teaching the UIUC-Coursera Heterogeneous Parallel Programming Course in winter 2014.

This course introduces concepts, languages, techniques, and patterns for programming heterogeneous, massively parallel processors. During the previous offering, 25,527 students enrolled, 15,000 students watched videos, 9,908 students participated in quizzes and labs, 2,811 students earned Certificate of Achievement or Certificate of Distinction. The contents and structure of the course have been significantly revised based on the experience gained. It covers heterogeneous computing architectures, data-parallel programming models, techniques for memory bandwidth management, techniques for overlapping communication with computation, and parallel algorithm patterns. The launch date of the course is January 6, 2014.

2013
 

Wen-Mei Hwu Speaks at NVidia GPU Technology Theater at SuperComputing 2013 (November 20, 2013)

Wen-Mei Hwu presents two synergistic systems that enable productive development of scalable, Efficient data parallel code. Triolet is a Python-syntax based functional programming system where library implementers direct the compiler to perform parallelization and deep optimization. Tangram is an algorithm framework that supports effective parallelization of linear recurrence computation.

Slides (PDF)
  

Scalability, portability, and numerical stability in GPU computing (November 14, 2013)

IWCSEThe rise of heterogeneous computing has significantly boosted the pace of progress in numeric methods, algorithm design and programming techniques for developing scalable applications. However, there has been a lack of practical languages and compilers in this movement. In preparation of petascale applications for deployment on Blue Waters, we see critical needs in these areas. I will discuss some recent progress in developing scalable, portable, and numerically stable libraries today and several important research opportunities. I will also review some recent advancement in languages and compilers for developing scalable numerical libraries.stract: The rise of heterogeneous computing has significantly boosted the pace of progress in numeric methods, algorithm design and programming techniques for developing scalable applications. However, there has been a lack of practical languages and compilers in this movement. In preparation of petascale applications for deployment on Blue Waters, we see critical needs in these areas. I will discuss some recent progress in developing scalable, portable, and numerically stable libraries today and several important research opportunities. I will also review some recent advancement in languages and compilers for developing scalable numerical libraries.

Wen-Mei Speaks at Computer Architecture Lab at Carnegie Melon (October 28, 2013)

The prevailing perception about programming heterogeneous parallel computing systems today is that major types of computing devices, such as CPUs, GPUs, and DSPs, each requires a different version of source code to achieve high performance. Even with common data parallel language such as OpenCL, most developers assume that they need to write different versions of source code for different device types. Unfortunately, such code versioning drastically increases development, testing, and maintenance cost of applications. In this talk, I will present two systems, one at the OpenCL level (MxPA) and one at functional programming level (Triolet), that enable single-source development of data parallel code for diverse device types. MxPA is a commercial product today and Triolet is a research prototype. For MxPA, I will show the key compiler transformations that enable a code base that is developed based on GPU performance guidelines to achieve high performance on multicore CPUs with SIMD instructions. I will also show the reasons why many GPU algorithms and CPU algorithms are converging. This brings up an interesting question: does the traditional definition of processor architectures really matter for application developers?

Webpage
  

Wen-Mei Hwu Speaks at Challenges in Genomics and Computing (July 22, 2013)

Wen-Mei Hwu gave a plenary talk, "Applications of Accelerators and other Technologies in Sequencing and their Success" and served as a panelist on the "Computing and Genomics: A Big Data Research Challenge" forum.

Challenges in Genomics and Computing met in Bangalore, India, July 22-24.

Slides (PDF)
  

Wen-mei Hwu gives 2013 Keynote at SAMOS Conference (June 15, 2013)

Wen-mei Hwu presented a keynote lecture, entitled Rethinking Computer Architecture for Throughput Computing, at the SAMOS XIII conference in Vathi, Samos Island, Greece. In the keynote, Wen-mei first presented results and lessons learned from porting major HPC science applications to utilize the 4096 Kepler GPUs in the Blue Waters supercomputer at the University of Illinois. He then discussed major challenges facing throughput computing applications in general . On one hand, commercial computer Hardware organizations have been undergoing fast transformation to drastically increase the throughput of processing large amounts of data while keeping the power consumption in check. On the other hand, computer architecture has evolved too slowly to facilitate hardware innovations, software productivity, algorithm advancement and user perceived improvements. The research community must rethink the scope of computer architecture research as we seek to create growth paths for the computer systems industry.

Keynote Abstract at the 2013 Samos Conference (PDF)
Keynote Slides at the 2013 Samos Conference (PDF)
  

Wen-Mei Hwu Speaks at Cornell University (April 6, 2013)

The IMPACT group at the University of Illinois has been working on the co-design of scalable algorithms and programming tools for massively threaded heterogeneous computing. A major challenge that we are addressing is to simultaneously achieve scalability, performance, numerical stability, portability, and productivity in GPU computing. In this talk, I will give a brief overview of the NSF Blue Waters petascale heterogeneous parallel computing system at the University of Illinois. I will show experimental results of our achievements to date in applications, libraries, and MxPA. I will then discuss our current work on Tangram and Triolet projects that are aimed to drastically reduce the development and maintenance cost of heterogeneous parallel computing applications.

Hwu Highlighted in Russian Periodical

Wen-Mei Hwu was recently interviewed for an article which appeared in Russia's Supercomputers magazine. The current link shows a low-res copy of the original Russian article. We have requested an English translation and will post it when available.

Magazine Cover (JPG)
  

University of Illinois Snags 2013 CCOE Achievement Award

NVidiaResearchers from the University of Illinois at Urbana-Champaign snagged the Second Annual Achievement Award for CUDA Centers of Excellence (CCOE), for their research with Blue Waters. The team was among three other groups of researchers from CCOE institutions, which include some of the worlds top universities, engaged in cutting-edge work with CUDA and GPU computing.

Each of the worlds 21 CCOEs was asked to submit an abstract describing their top achievement in GPU computing over the past year. A panel of experts, led by NVIDIA Chief Scientist Bill Dally, selected four CCOEs to present their achievements at a special event during our annual GPU Technology Conference this week in San Jose. CCOE peers voted for their favorite, who won bragging rights recipient of the second CUDA Achievement Award 2013.

The four finalists each of whom received a GeForce GTX Titan built around the same Kepler chip that powers the worlds fastest supercomputer, the Titan system at the Oak Ridge National Laboratory. The University of Illinois not only won bragging rights and a Titan but also got to take home a Microsoft Surface tablet and special NVIDIA green keyboard.

Achievement Award Submission (PDF)
  

Wen-mei Hwu Keynote at the 2013 HiPEAC Conference in Berlin, Germany (January 21, 2013)

Wen-mei Hwu presented a keynote lecture at the 2013 HiPEAC Conference in Berlin, Germany. Wen-mei made an observation that the rise of heterogeneous computing has significantly boosted the pace of progress in numeric methods, algorithm design and programming techniques for developing scalable applications. However, there has been a lack of practical languages and compilers in this movement. He discussed some recent progress in developing scalable, portable, and numerically stable libraries today and several important research opportunities. He then presented recent advancement and future needs in languages and compilers for developing scalable numerical libraries.

Abstract (PDF)
Slides (PDF)