Subquadratic – 推出 SubQ 1.1 Small 模型

Subquadratic 发布了 SubQ 1.1 Small，这是一款基于子二次注意力机制的高效语言模型。该模型在保持线性复杂度推理优势的同时，进一步优化了训练效率与性能表现，为长序列任务提供了更具成本效益的解决方案。

相关报道

A brief history of KV cache compression developments
5.0
KV cache compression techniques, including Multi-Query Attention (MQA), Grouped-Query Attention (GQA), Multi-head Latent Attention (MLA), and linear-attention hybrids, have evolved to reduce memory overhead in large language models. These developments have quietly enabled the long context windows required for modern agentic LLM applications by making key-value caching more efficient.