Embodied.cpp: A Portable Inference Runtime of Embodied AI Models

Embodied.cpp is introduced as a portable, open-source C++ runtime for deploying embodied AI models on edge devices. It supports a range of visual-language-action models and achieves low-latency inference on platforms like CPUs, NVIDIA Jetson, and Apple Silicon, targeting consumer-grade hardware.

Background

This paper introduces Embodied.cpp, an open-source C++ inference runtime designed to run embodied AI models (robotics/agents that perceive and act in the physical world) efficiently on consumer hardware like laptops and phones, without needing cloud servers or expensive GPUs. It's modeled after llama.cpp, the popular tool that let people run large language models on everyday computers. The significance: embodied AI has been limited to well-funded labs with server farms; this project aims to democratize it by making models like vision-language-action (VLA) systems run locally, with low latency and support for standard robotics interfaces (ROS 2, MQTT). Prior context: most embodied AI research relies on Python frameworks (PyTorch, ROS) that are heavy and power-hungry, making edge deployment impractical. Embodied.cpp directly addresses that bottleneck.