Scalable Packed Layouts for Vector-Length-Agnostic ML Code Generation
This paper introduces scalable packed layouts for vector-length-agnostic machine learning code generation. It proposes techniques to efficiently pack and unpack data for variable-length vector architectures, enabling portable and performant ML kernels across different hardware without sacrificing efficiency.