SubQ 1.1 Card: Linear-scaling sparse attention with 98% retrieval at 12M tokens [pdf]
SubQ 1.1 introduces a linear-scaling sparse attention mechanism that maintains 98% retrieval accuracy at 12 million tokens, significantly extending context length efficiency for large language models while reducing computational overhead compared to full attention methods.