CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs
The paper introduces CODA, a framework that rewrites transformer blocks as GEMM-epilogue programs, enabling flexible and efficient execution by exposing matrix multiplication and post-processing as a unified, composable computation pattern.