Serving Local AI on My Jetson Through Durable Streams
The author details running local AI models on an NVIDIA Jetson device, using Durable Streams via the s2.dev platform to handle processing and data management. This setup enables efficient, persistent streaming for AI workloads on edge hardware.
Background
- NVIDIA Jetson is a line of compact, power-efficient computing boards designed for AI workloads at the edge (e.g., robots, drones, cameras) rather than in a cloud data center. They're popular with hobbyists, researchers, and industrial developers who need local inference.
- "Durable streams" refers to a framework (likely the S/2 project mentioned in the domain) for building reliable, persistent data pipelines — think message queues that survive crashes and network issues. The author is using this to serve AI model outputs.
- The post sits at the intersection of two trends: 1) running large language models (LLMs) locally instead of through cloud APIs like ChatGPT, and 2) building robust infrastructure for these local setups.
- The main challenge being addressed: serving AI models from a Jetson — which has limited RAM and no high-end GPU — while maintaining reliability, is harder than running them on a desktop or server.