Skip to content
TopicTracker
From HackerNewsView original
TranslationTranslation

Fixing a kubelet Memory Leak in Kubernetes 1.36

A kubelet memory leak in Kubernetes 1.36 is caused by kube-controller-manager sending excessive PATCH requests for completed Pods. The fix involves reverting behavior with feature flags or upgrading to a patched version.

Background

- **kubelet** is the primary "node agent" that runs on every machine in a Kubernetes cluster. It is responsible for ensuring that containers are running in Pods as expected. A memory leak here means the process gradually consumes more and more RAM, eventually causing node instability. - **Kubernetes 1.36** refers to a specific version of the open-source container orchestration platform. The article describes a real-world bug found in this version's kubelet code. - The leak was traced to the **PLEG (Pod Lifecycle Event Generator)** component, which periodically relists containers to detect state changes. In 1.36, a code change caused these relist operations to accumulate internal data structures without proper cleanup. - The fix involved backporting a patch (PR #129879) from an upstream commit, correcting how the kubelet manages its internal container cache during relists. This highlights the importance of monitoring kubelet memory usage and the value of keeping Kubernetes versions reasonably up to date for bug fixes.