Subquadratic – 推出 SubQ 1.1 Small 模型
Subquadratic 发布了 SubQ 1.1 Small,这是一款基于子二次注意力机制的高效语言模型。该模型在保持线性复杂度推理优势的同时,进一步优化了训练效率与性能表现,为长序列任务提供了更具成本效益的解决方案。
Subquadratic 发布了 SubQ 1.1 Small,这是一款基于子二次注意力机制的高效语言模型。该模型在保持线性复杂度推理优势的同时,进一步优化了训练效率与性能表现,为长序列任务提供了更具成本效益的解决方案。
KV cache compression techniques, including Multi-Query Attention (MQA), Grouped-Query Attention (GQA), Multi-head Latent Attention (MLA), and linear-attention hybrids, have evolved to reduce memory overhead in large language models. These developments have quietly enabled the long context windows required for modern agentic LLM applications by making key-value caching more efficient.