Skip to content
TopicTracker
来自 HackerNews查看原文
译文语言译文语言

KV Cache Compression 900000x Beyond TurboQuant and Per-Vector Shannon Limit

该研究提出了一种突破性的KV缓存压缩方法,实现了比TurboQuant高出90万倍的压缩比,甚至超越了每个向量的香农极限。这项技术通过创新的压缩策略,显著减少了大型语言模型推理时的内存占用,为高效部署提供了新的可能性。

相关报道

  • Anthropic has introduced a 1 million token context window for its Claude Opus 4.6 and Sonnet 4.6 models, representing a significant technical advancement. The company is offering this increased capacity without additional charges to users.

  • The article explains how to package Perl and shell scripts for deployment on NixOS, covering dependency management and reproducible builds. It demonstrates creating Nix expressions to handle Perl modules and shell dependencies in the Nix ecosystem.

  • llm-openrouter 0.6 adds a new "llm openrouter refresh" command that allows users to refresh the list of available models without waiting for cache expiration. This feature was added to enable immediate access to new models like Kimi 2.6 on OpenRouter.