Deep learnings reliance on matrix-multiplication (GEMM) forcompute has driven both research and industry to develop matrix-multiplication accelerator hardware collectively called TensorCore Units (TCUs) in this paper. TCUs are designed to accel-erate Multilayer Perceptrons (MLP), Convolutional Neural Net-works (CNN), and Recurrent Neural Networks (RNN) or DeepNeural Network (DNN) in general. TCUs come under the guiseof different marketing terms, be it NVIDIAs Tensor Cores [54],Googles Tensor Processing Unit, Intels DLBoost, AppleA11s Neural Engine, Teslas HW3, or ARMs ML Processor.They vary in the underlying hardware implementation, and are prevalent in both cloud and edge devices. More information is available at https://tcu.c3sr.com
|