DwarfStarにおけるLLM推論の分散処理

ハイエンドNVIDIAカードの高騰に対し、Mac StudioやMacBook ProといったApple製品がローカルLLM推論の有力な代替手段となっている。DwarfStarプロジェクトでは、複数のMacを組み合わせた分散推論のアプローチとして、レイヤー分割によるシーケンシャル実行、RDMAを用いた並列実行、そしてモデルアンサンブルによる完全独立実行という3つの方法を検討。中でも、オープンウェイトモデル同士のアンサンブルは、通信オーバーヘッドがなく知識を補完し合える有望な手法として注目されている。

Distributing LLM Inference in DwarfStar

3.0

Antirez introduces DwarfStar, a prototype system for distributing LLM inference across multiple nodes. The approach partitions the model's layers across machines, using a fan-out pattern where a coordinator sends tokens to all layer groups in parallel during prefill, reducing inference latency. DwarfStar is designed to run on low-end hardware, aiming for cost-effective, decentralized inference.

DwarfStarにおけるLLM推論の分散処理

関連記事

Distributing LLM Inference in DwarfStar

DwarfStarにおけるLLM推論の分散処理

関連記事

Distributing LLM Inference in DwarfStar