Adding Features Without Interrupting Network Connections
The article discusses techniques for adding new features to network-connected applications without disrupting existing network connections. It covers strategies like connection migration, protocol version negotiation, and graceful upgrade mechanisms to maintain service continuity during updates.
Background
- The author is **Evan Jones**, a software engineer at **Exa** (formerly Metaphor), an AI-search startup backed by Y Combinator that runs a large-scale web crawler.
- **TCP (Transmission Control Protocol)** is the foundational internet protocol that ensures data sent between computers arrives reliably. Everything on the web (HTTP/HTTPS) runs over TCP.
- The article focuses on **HTTP keep-alive connections**: instead of opening a fresh TCP connection for every request (which is slow), the client reuses an existing one. For a crawler handling millions of URLs, keeping connections alive is critical for performance.
- The central problem: **TCP sockets belong to a single operating system process**. When you deploy new code (e.g., to add a feature that changes how connections are managed), existing open connections can't be handed off to the new process. They must be closed gracefully.
- The piece walks through practical strategies — connection-draining timeouts, health-check APIs, and staged rollouts — for updating live services without abruptly killing in-flight network connections or dropping data.