Tensor Core Compiler

Dakkak, Abdul; Li, Cheng

Deep  learnings  reliance  on  matrix-multiplication  (GEMM)  forcompute has driven both research and industry to develop matrix-multiplication accelerator hardware collectively called TensorCore  Units  (TCUs)  in  this  paper.  TCUs  are  designed  to  accel-erate  Multilayer  Perceptrons  (MLP),  Convolutional  Neural  Net-works  (CNN),  and  Recurrent  Neural  Networks  (RNN)  or  DeepNeural Network (DNN) in general. TCUs come under the guiseof different marketing terms, be it NVIDIAs Tensor Cores [54],Googles Tensor Processing Unit, Intels DLBoost, AppleA11s Neural Engine, Teslas HW3, or ARMs ML Processor.They vary in the underlying hardware implementation, and are prevalent in both cloud and edge devices. More information is available at https://tcu.c3sr.com