Triton: an intermediate language and compiler for tiled neural network computations 论文

2019引用 233
Parallel Computing and Optimization TechniquesAdvanced Neural Network ApplicationsAdvanced Memory and Neural Computing

摘要

The validation and deployment of novel research ideas in the field of Deep Learning is often limited by the availability of efficient compute kernels for certain basic primitives. In particular, operations that cannot leverage existing vendor libraries (e.g., cuBLAS, cuDNN) are at risk of facing poor device utilization unless custom implementations are written by experts – usually at the expense of portability. For this reason, the development of new programming abstractions for specifying custom Deep Learning workloads at a minimal performance cost has become crucial.