HyperLink   XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs
Best Paper
Publication Year:
  Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-mei Hwu

The world sees a proliferation of machine learning/deep learning (ML) models and their wide adoption in different application domains recently. This has made the profiling and characterization of ML models an increasingly pressing task for both hardware designers and system providers, as they would like to offer the best possible computing system to serve ML models with the desired latency, throughput, and energy requirements while maximizing resource utilization. Such an endeavor is challenging as the characteristics of an ML model depend on the interplay between the model, framework, system libraries, and the hardware (or the HW/SW stack). A thorough characterization requires understanding the behavior of the model execution across the HW/SW stack levels. Existing profiling tools are disjoint, however, and only focus on profiling within a particular level of the stack. This largely limits the types of analysis that can be performed on model executions.
This paper proposes XSP - an across-stack profiling design along with a leveled experimentation methodology to capture a holistic view of ML model execution on GPUs. XSP innovatively leverages distributed tracing to capture and correlate profiles from different sources. XSP accurately captures the profiles at each level of the stack; in spite of the profiling overheads incurred from the profiling providers. We couple the profiling design with an automatic analysis pipeline to systematically analyze 65 state-of-the-art ML models. Through this characterization, we demonstrate that XSP provides insights (which are difficult to discern otherwise) on the characteristics of ML models, ML frameworks, and GPU hardware.