Xiao-Long Wu

Graduated: May, 2013
Email: xiaolongwill@gmail.com

SUMMARY

Recent research focus involves DNA sequence assembly, alignment, and compression. Aiming at leveraging a computer engineering background to solve computational problems in bioinformatics and other places. Since 2005, have coauthored 13 papers, 1 book chapter, 7 patents, and 1 provisional patent across the fields of genome sequence assembly, magnetic resonance imaging (MRI) with GPUs, compiler technologies for parallelism, and analog and digital circuit design. Have released two software packages of ~80,000 source lines of code to the public.

EDUCATION

Doctorate in progress, PhD Candidate 2008 ~ present

Electrical and Computer Engineering, University of Illinois at Urbana-Champaign
Advisor: Wen-mei W. Hwu (ECE Professor; Fellow, IEEE and ACM), w-hwu@illinois.edu
Co-advisor: Jian Ma (BIOE Professor), jianma@illinois.edu.

M.S. in Computer Science and Engineering, Yuan-Ze University, Taiwan, 2001.

B.S. in Computer Science and Engineering, Yuan-Ze University, Taiwan, 1999.

BIOGRAPHY

I received my B.S. and M.S. degrees in Computer Science and Engineering from Yuan-Ze University, Taiwan, in 1999 and 2001 respectively. Since 2001, I was an R&D engineer in an EDA testing tool vendor, SynTest Technologies, for 2 years and in a mixed-signal IC design house, Fortune Semiconductor Corp., for 3 years, all in Taiwan. In 2007, I was a senior R&D engineer and co-founder of a mixed-signal IC design house, Hycon Technologies Corp., Taiwan, whose market value is tripled now. From 2006 to 2008, I was also pursuing the Ph.D. degree at the Graduate Institute of Electronics Engineering, National Taiwan University. After a six-month trip as a visiting student in UIUC, I decided to join the IMPACT research group in 2008 for a different life experience.

Before officially entering the industry, I got many chances to work as a waiter, foreman, clerk, salesman, TA of programming classes, and by-piece-pay programmer hiring several students working for me. These working experiences make me think differently from the viewpoints of an employee or employer. I was also the vice-president of the Student Association Activities of CSE Department, documentation officer of the Calligraphy Club, and the captains of the Athletic Team and Cycling Team, all in Yuan-Ze University. In the process of leading or being led, I understand the group culture makes great difference. Teamwork is easy to say but hard to do.

After the years of being a student, employee, employer, or entrepreneur, I truly believe in integrity, hard working, being humble, and being grateful to everything. This belief shall apply to any people at any places.

I won dozens of honors and awards for leading university associations, attending university-wide athletic games, and academic achievements. I have coauthored 13 papers, 1 book chapter, 7 patents and 1 provisional patent.

My current research focus involves DNA sequence assembly, compiler design for parallelism, application acceleration, and debugging and performance tuning for parallelism. I’m also interested in hardware/software co-design, high-level synthesis, and hardware/software design for low power.

RESEARCH INTERESTS

The goal of my research is to apply the latest computing technologies on biomedical applications.

DNA Sequence Assembly (Fall 2010 ~ Present)

Among scientific disciplines, genomics has one of the fastest growing bodies of data today. This is largely due to the recent advances in next-generation sequencing (NGS) technologies, which have tremendously reduced DNA sequencing costs. This massive amount of sequencing data have provided the basis to better understand the tree of life and to identify molecular signatures of human variation and disease mechanisms. To make such analyses possible, the key computational task is to de novo assemble the raw short sequences (called reads) from NGS technologies into complete or near-complete genomes. However, the enormous amount of data creates an inevitable barrier to the assembly process in terms of memory usage and computation time. It usually takes days to weeks to assemble an entire human genome and requires a machine with hundreds of Giga Bytes memory and hundreds of processors. In addition, the lower quality and limited read length produced by NGS, as compared to the traditional Sanger sequencing, make it extremely difficult to assemble reads into long scaffolds, which are essential to facilitate the analyses of large-scale genome rearrangements.

This project aims to make the assembly happen on a couple of cheap commodity computers while still producing the same or better quality results within reasonable time. We're taking an iterative assembly approach to assemble the given genome read libraries. Several preliminary experiments show our approach only requires a 4 GB memory machine and provides better quality results than two well-known assemblers.

Magnetic Resonance Imaging (MRI) Acceleration (Spring 2009 ~ Present)

In clinical environment, MR image reconstruction is usually performed to trade image quality for a shorter processing time. The image quality is sacrificed from the beginning stage of acquiring patient data to the stage of reconstructing images. Various approaches, for example, using Cartesian data sampling and fast Fourier transform (FFT), are proposed to reach a balance between image accuracy and processing time. In recent years, the prevalence of high performance GPUs introduces a new era to the MRI field. The abundant and cheap computation power motivates us to rethink the possibility of using compute-intensive reconstruction algorithms, like field inhomogeneity compensation, and better data acquisition methods to improve image quality and to shorten the data sampling time on patients, e.g., Sensitivity Encoding for Fast MRI (SENSE) reconstruction. In the past three years, our research team has been exploring the possibility and made good achievements in this field.

Our toolkit implements a fast iterative MRI reconstruction algorithm using SENSE and includes image smoothness and edge sharpening functionalities. The toolkit can handle 2D images with sizes up to 5122 and 3D images up to 256^2 x 32, which usually take days to hours in CPU time and now minutes in GPU time. The speedup we achieved is about two orders of magnitude. To benefit this field, the toolkit is distributed at http://impact.crhc.illinois.edu/mri.php

Compiler Technologies for Parallelism

Parallel program development, particularly for Single Instruction Multiple Thread (SIMT) models, like CUDA and OpenCL, can be a difficult process on tailoring an application to an underlying GPU hardware. The current efforts on improving the difficulty are on providing libraries (e.g., Thrust), better debugging capabilities (e.g., cuda printf, cuda gdb), performance information extraction (e.g., cuda profiler), etc. However, these SIMT languages are intrinsically designed for performance-driven programming such that these supplemental tools have difficulties in providing the complete benefits from a general purpose language, like C/C++. For example the testing code can’t be mixed with the functioning code and the exception handling scheme is not provided either. Moreover, since the language is designed for SIMT model, the data manipulation is thread-oriented. This is in favor of the code translation to machine code but bad for programmability such that the supplemental debugging tools are not as user friendly as the traditional printf and gdb. All these make it difficult to develop large and performance-demanding applications like the MRI and DNA sequence assembly projects. It thus motivates this research project.

This research work proposes a high level language that facilitates debugging and tuning for high performance by emulating the underneath SIMT hardware architecture and providing a data-oriented manipulation interface above. The fundamental feature of this language, HCUDA, includes a Single Instruction, Multiple Data (SIMD) programming model that ensures correctness and is portable across SIMT implementations without requiring a real GPU device. Since HCUDA is implemented as a subset of C++ language, the existent C/C++ tool chain can be used seamlessly. In addition, performance tuning on individual kernels can be expedited by identifying computational hot spots and memory bottlenecks at a fine granularity. Using this language as a tool, developers can reduce the effort during the debugging process and also quickly determine the regions suitable for performance tuning.

HONORS & RECORDS
2012	• Invited talk on "TIGER: Tiled iterative genome assembler," by Dr. Wen-Hsiung Li, Biodiversity Research Center, Academia Sinica, Taiwan, September 2012. • Awarded the ECE Computer Engineering Fellowship sponsored by Intel Corporation, UIUC, for 2012- 2013. This award is presented to a doctoral degree candidate in the Department of Electrical and Computer Engineering who has demonstrated excellence in research in the area of computer engineering.
2010	• Awarded the SPIE Contingency Student Travel Grant for presenting the research work in the IS&T/SPIE Electronic Imaging 2011 Conference. • Invited section chair of the First International Workshop on Frontier of GPU Computing (FGC 2010) in conjunction with the 10th IEEE International Conference on Computer and Information Technology (CIT 2010). • Awarded the Dan Vivoli Fellowship in Electrical and Computer Engineering, UIUC, for 2010-2011. This award is presented to a doctoral degree candidate who has demonstrated excellence in research of specializing the use of a Graphics Processing Unit (GPU) in parallel computing. • Nominated as the Member of Graduate Student Advisory Council at the Virtual School of Computational Science and Engineering, UIUC. This honor is given to a teaching assistant who has demonstrated great efforts and contributions to the Virtual School of Computational Science and Engineering.
2001	• First Class of Yuan-Ze Medal on Science, Yuan-Ze University. This award is given to students having great scholastic or research achievements.
2000	• Second Place of IC/CAD Contest, by the Ministry of Education, Taiwan. This is a nation-wide contest, where the problems are from industrial companies.
1999	• Awarded Citation for Participating Actively in Group Activities, Yuan-Ze University.
1995~ 1997	• Awarded several athletic citations, including champions of 1500m, 5000m, etc., in the games held by Yuan-Ze University and other government associations.
1997	• Awarded Second Class of Yuan-Ze Medal on Association Activities, Yuan-Ze University. • Awarded Second Place of Student Associations on leading the Student Association of Department of Computer Science and Engineering, Yuan-Ze University. • Awarded First Place of Literature and Art Associations on leading the Calligraphy Club, Yuan-Ze University.

RELEASED SOFTWARE PACKAGES

04/2009 ~ Present

"IMPATIENT MRI Toolset", an implementation of 30,000-line source code in C++ and CUDA for iterative MR image reconstruction using GPUs. It is used in the research of medical imaging, especially in the area of image reconstruction for magnetic resonance imaging (MRI).

Source code: http://impact.crhc.illinois.edu/mri.aspx

06/2010 ~ Present

"TIGER: Tiled iterative genome assembler", an implementation of 47,000-line source code in C++ (with OpenMP and MPI enabled) for iterative genome sequence assembly. Nowadays, assembling a human genome takes a computer cluster with hundreds of processors and up to Tera-bytes of memory, which is not affordable to most DNA sequencing research centers. This project aims to make this happen on a couple of cheap commodity computers while producing better quality results within reasonable time.

Binary: http://impact.crhc.illinois.edu/Tiger/tiger.aspx

SKILLS

Software:
* Compiler technology development with experiences using SUIF, Machine-SUIF, and LLVM.
* Application parallelization through CUDA, OpenMP, MPI, Pthread.
* C/C++ program performance enhancement.
* Database design in SQL, VBasic.

Hardware:
* Digital IC design via cell-based design and full-custom layout.
* Digital IC design EDA flow construction from Verilog/VHDL design to SPICE simulations.
* Memory chip testing.
* EDA tool development.
* Unix/Linux system administration.

Links
Resume available on request Autobiography

Papers/Patents/Book Chapters

Publication on Genome Sequence Assembly

2012 • Xiao-Long Wu, Y. Heo, I. El Hajj, W. W. Hwu, D. Chen, J. Ma, "TIGER:
Tiled iterative genome assembler," Journal of BMC Bioinformatics, 2012.

• Xiao-Long Wu, Y. Heo, W. W. Hwu, D. Chen, J. Ma, "TIGER: Tiled
iterative genome assembler," US provisional patent # 61/683,358

Publication on Magnetic Resonance Imaging (MRI) with GPUs

2013   • J. Gai, N. Obeid, J. L. Holtrop, Xiao-Long Wu, F. Lam, M. Fu, J. P. Haldar,               W.W. Hwu, Z-P Liang, B. P. Sutton, "More IMPATIENT: A Gridding-
              Accelerated Toeplitz-based Strategy for Non-Cartesian High-Resolution
              3D MRI on GPUs," Journal of Parallel and Distributed Computing (JPDC)
             2013.

2012   • J. Gai, J. L. Holtrop, Xiao-Long Wu, F. Lam, M. Fu, J. P. Haldar, W. W.
              Hwu, Z.-P. Liang, B. P. Sutton,"More IMPATIENT: A Gridding-Accelerated
              Toeplitz-based Strategy for Non-Cartesian High-Resolution 3D MRI on
              GPU," Proceedings of the International Society for Magnetic Resonance
               in Medicine (ISMRM), May 2012.

2011   • Xiao-Long Wu, J. Gai, F. Lam, M. Fu, J. P. Haldar, Y. Zhuo, Z.-P. Liang,
               W. W. Hwu, B. P. Sutton, "IMPATIENT MRI: Illinois Massively Parallel
                Acceleration Toolkit for Image reconstruction with ENhanced
               Throughput in MRI," Proceedingsof the International Society for Magnetic
               Resonance in Medicine (ISMRM), May 2011.

• Xiao-Long Wu, J. Gai, F. Lam, M. Fu, J. P. Haldar, Y. Zhuo, Z.-P. Liang,
W. W. Hwu, B. P. Sutton, "IMPATIENT MRI: Illinois Massively Parallel
Acceleration Toolkit for Image reconstruction with ENhanced Throughput in MRI," Proceedings of the IEEE International Symposium on Biomedical Imaging (ISBI), March 2011.

• Xiao-Long Wu, J. Gai, F. Lam, M. Fu, J. P. Haldar, Y. Zhuo, Z.-P. Liang, W. W. Hwu, B. P. Sutton, "Advanced MRI Reconstruction Toolbox with Accelerating on GPU," Proceedings of the IS&T/SPIE Electronic Imaging 2011 Conference on "Parallel Processing for Imaging Applications", January 2011.

• Y. Zhuo, Xiao-Long Wu, J. P. Haldar, T. Marin, W. W. Hwu, Z.-P. Liang,
   B. P. Sutton, "The Role of GPUs in Advancing Clinical Imaging with
   Magnetic Resonance Imaging," in GPU Computing Gems, Book
   Chapter 44, W. W. Hwu Ed., Elsevier Inc., 2011.

2010 • Y. Zhuo, Xiao-Long Wu, J. Haldar, W. W. Hwu, Z.-P. Liang, B. P. Sutton,
"Sparse Regularization in MRI Iterative Reconstruction using GPUs," Proceedings of the 3rd International Conference on BioMedical Engineering and Informatics (BMEI'10), October 2010.

• Y. Zhuo, Xiao-Long Wu, J. P. Haldar, W. W. Hwu, Z.-P. Liang, B. P.
  Sutton, "Multi-GPU Implementation for Iterative MR Image Reconstruction
   with Field Correction," Proceedings of the International Society for
  Magnetic Resonance in Medicine (ISMRM), May 2010.

• Y. Zhuo, Xiao-Long Wu, J. P. Haldar, W. W. Hwu, Z.-P. Liang, B. P.
  Sutton, "Accelerating Iterative Field- Compensated MR Image
  Reconstruction on GPUs," Proceedings of the IEEE International
  Symposium on Biomedical Imaging (ISBI), April 2010.

Publication on Compiler Technologies for Parallelism

2010 • Xiao-Long Wu, N. Obeid, W. W. Hwu, "Exploiting More Parallelism from Applications Having Generalized Reductions on GPU Architectures," Proceedings of the 10th IEEE International Conference on Computer and Information Technology (CIT 2010), pp. 1175-1180, June 2010.

2009 • G.-H. Lin, Y.-N. Wen, Xiao-Long Wu, S.-J. Chen, and A. P. Su, "SIMD Code Generation for Multimedia Application," International Journal of Electrical Engineering, vol. 16, no. 1, pp. 1-12, February 2009.

2008 • H.-K. Chen, Xiao-Long Wu, and S.-J. Chen, "A Compiler Front-End
Framework for Parallelism Profiling," Proceedings of the VLSI Design/CAD Symposium, August 2008.

2007 • G.-H. Lin, Y.-N. Wen, Xiao-Long Wu, S.-J. Chen, and Y.-H. Hu, "Design of a SIMD Multimedia SoC Platform," Proceedings of the IEEE International SOC Conference (SOCC), September 2007.

Publication on Analog and Digital Circuit Design

2007 • Xiao-Long Wu, Y.-C. Huang, "Fuse Repair Circuit and Its Operating
Method," US Patent # 20070164807

• Xiao-Long Wu, Y.-C. Huang, "One-Time Programmable Memory and Method of Burning Data of the Same," US Patent # 20070153609

2006 • Y.-C. Huang, Xiao-Long Wu, "Fuse Burning Check Circuitry," Taiwan
Patent # 095109448, China Patent # 200610066837.1

• Y.-C. Huang, Xiao-Long Wu, "Fuse Burning Check Circuitry: Enhanced Version," Taiwan Patent # 095109447, China Patent # 200610066839.0

• Y.-C. Huang, Xiao-Long Wu, "A Fuse Cell with Two Power Supplies," Taiwan Patent # 298495, China Patent # 200610066838.6

• Y.-C. Huang, Xiao-Long Wu, "Fuse Burning Control Circuitry," Taiwan Patent # 095100142, China Patent # 200510132946.4

2005 • Xiao-Long Wu, Y.-C. Huang, "One-Time Programmable Memory and Method of Burning Data of the Same," Taiwan Patent # 269306, China Patent # 200510132163.6

Maintained by Xiao-Long Wu (xiaolong at illinois dot edu)