Translation

Show HN: Self hosting a modern LLM stack

The project "llmaker" provides a setup for self-hosting a modern LLM stack, enabling users to run large language models locally. It includes tools and configurations to manage models, APIs, and interfaces for privacy and control.

Background

- This GitHub project ("llmaker") provides a script to self-host a full modern LLM stack — including a model server (llama.cpp or vLLM), an inference API (OpenAI-compatible), a vector database, and a web UI — on your own hardware. - "Self-hosting" means running AI models locally on your own machine (or a rented server) instead of relying on cloud services like OpenAI or Anthropic, giving you full control over data, costs, and privacy. - The key components: llama.cpp (efficient CPU/GPU inference for LLMs), vLLM (a high-throughput inference engine for GPUs), and various open-source LLMs (like Llama, Mistral, Qwen) that can run on consumer hardware. - This matters because it lowers the barrier for individuals or small teams to run capable AI models privately and offline, without paying per-token API fees or sending data to third parties — a growing trend as open-weight models improve and hardware gets cheaper.