Input-Aware Auto-Tuning of Compute-Bound HPC Kernels
Authors
Event Type
Paper
TimeThursday, November 16th3:30pm -
4pm
Location402-403-404
DescriptionEfficient implementations of HPC applications for
parallel architectures generally rely on external
software packages (e.g., BLAS, LAPACK, CUDNN). While
these libraries provide highly optimized routines for
certain characteristics of inputs (e.g., square
matrices), they generally don't retain optimal
performance across the wide range of problems
encountered in practice. In this paper, we present
ISAAC, an input-aware auto-tuning framework for matrix
multiplications and convolutions, capable of generating
not only hardware, but also application-specific compute
kernels, by combining highly parameterized PTX kernel
templates with data-driven performance modeling.
Numerical experiments on the NVIDIA Maxwell/Pascal
architectures shows up to 3x performance gains over both
cuBLAS and cuDNN after only a few hours of
auto-tuning.
Download PDF:
here




