Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs 事件

PRODUCT_LAUNCH2026-06-06影响: MEDIUM

Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs arXiv:2604.23466v2 Announce Type: replace-cross Abstract: NVIDIA's CUDA Tile (CuTile) introduces a Python-based, tile-centric abstraction for GPU kernel development that aims to simplify programming while retaining Tensor Core and Tensor Memory Accelerator (TMA) efficiency on modern GPUs. We present the first independent, cross-architecture evaluation of CuTile against established approaches such as cuBLAS, Triton, WMMA, and raw