Translation

Subquadratic – Introducing SubQ 1.1 Small

Subquadratic released SubQ 1.1 Small, a 1.5B open-weight language model using a soft-moe-2x8 architecture. It outperforms larger models like Gemma 2 2.6B and Phi-2 2.8B on several benchmarks. The model uses subquadratic soft-MoE layers (MMA and MMAM) for improved efficiency.

A brief history of KV cache compression developments
5.0
KV cache compression techniques, including Multi-Query Attention (MQA), Grouped-Query Attention (GQA), Multi-head Latent Attention (MLA), and linear-attention hybrids, have evolved to reduce memory overhead in large language models. These developments have quietly enabled the long context windows required for modern agentic LLM applications by making key-value caching more efficient.

Subquadratic – Introducing SubQ 1.1 Small

Related stories

A brief history of KV cache compression developments