Cache Merging as a Convergent Replicated State for Multi-Agent Latent Reasoning
This paper explores using cache merging as a convergent replicated state to support multi-agent latent reasoning, aiming to improve efficiency and coherence in distributed AI reasoning systems.
Background
- This paper introduces "cache merging," a technique that merges the internal memory (KV cache) of multiple large language model inference runs to enable multi-agent reasoning without a central orchestrator.
- KV cache is the hidden-state data an LLM stores during text generation; normally each agent's cache is isolated. By merging caches, agents can share unfinished thoughts and converge on a shared latent state.
- The method targets "multi-agent latent reasoning" — letting several LLM instances work on the same problem by exchanging intermediate representations rather than natural language.
- A key motivation: it avoids the overhead of a centralized controller or a separate aggregation model, since the merge itself acts as the coordination primitive.