Skip to content

话TopicTracker

トレンドカテゴリ概要

Loading deep-dive…

© 2026 TopicTracker

について利用規約プライバシー

出典 HackerNews原文を表示 ↗

翻訳言語翻訳言語

GateGPT: FPGA上で80MHz動作、毎秒56kトークンのTransformer（KVキャッシュ）

FPGA上で80MHzという低クロックで動作しながら、毎秒56,000トークンという驚異的な処理速度を実現するTransformer推論アクセラレータ「GateGPT」が話題を集めている。KVキャッシュを活用することで、メモリ帯域幅の制約を緩和し、高いスループットを達成している。

関連記事

A brief history of KV cache compression developments
5.0
KV cache compression techniques, including Multi-Query Attention (MQA), Grouped-Query Attention (GQA), Multi-head Latent Attention (MLA), and linear-attention hybrids, have evolved to reduce memory overhead in large language models. These developments have quietly enabled the long context windows required for modern agentic LLM applications by making key-value caching more efficient.