New course: Efficient Inference with SGLang: Text and Image Generation, built in partnership with LMSys @lmsysorg and RadixArk @radixark, and taught b...
A new course teaches efficient inference using SGLang, an open-source framework that reduces redundant computation in LLM and diffusion model deployments. The course covers implementing KV cache, scaling caching across users with RadixAttention, and accelerating image generation. It aims to make LLM inference faster and more cost-efficient at scale.