Luce KVFlash: 256K context with 72MiB of KV cache on the GPU
Luce KVFlash is a memory-efficient optimization enabling 256K context windows using only 72 MiB of KV cache on the GPU. It reduces memory consumption for long-sequence inference by compressing key-value cache storage.