Skip to content

话TopicTracker

トレンドカテゴリ概要

Loading deep-dive…

© 2026 TopicTracker

について利用規約プライバシー

出典 HackerNews原文を表示 ↗

翻訳言語翻訳言語

SubQ 1.1 カード：線形スケーリング sparse attention により 1200万トークンで 98% の検索精度を達成 [pdf]

本稿では、SubQ 1.1 のモデルカードを紹介する。SubQ 1.1 は、線形スケーリングの sparse attention メカニズムを採用し、1200万トークンのコンテキスト長において 98% の高い検索精度を実現する。このアプローチにより、長大なシーケンスでも効率的な処理が可能となる。

関連記事

A brief history of KV cache compression developments
5.0
KV cache compression techniques, including Multi-Query Attention (MQA), Grouped-Query Attention (GQA), Multi-head Latent Attention (MLA), and linear-attention hybrids, have evolved to reduce memory overhead in large language models. These developments have quietly enabled the long context windows required for modern agentic LLM applications by making key-value caching more efficient.