Distributing LLM Inference in DwarfStar
Salvatore Sanfilippo introduces DwarfStar, a proof-of-concept for distributing LLM inference across machines using a protocol over Unix sockets, stdout, and HTTP, enabling models larger than a single GPU's VRAM via pipeline parallelism.