Moebius: 0.2B image inpainting model with 10B-level performance
Researchers from Huazhong University of Science and Technology have developed Moebius, an image inpainting model with only 0.2 billion parameters that achieves performance comparable to 10-billion-parameter models, significantly reducing computational cost while maintaining high-quality results.
Background
- **Moebius** is a new image inpainting model (0.2 billion parameters) that claims to match or exceed the performance of models 50× larger (10B parameters), such as the state-of-the-art **FLUX Fill** (by Black Forest Labs) or **Stable Diffusion 3.5** variants. Inpainting means reconstructing missing or removed parts of an image realistically.
- The paper comes from **HUST** (Huazhong University of Science and Technology) and **VILA Lab** — an academic computer vision group. They achieve this efficiency with a "mixture-of-experts" design where only a fraction of the model activates per inference.
- The key innovation is a **"trident-shaped distillation"** strategy: training a small student model to mimic the inpainting behavior of a huge teacher model (likely FLUX Fill) on three parallel tasks (masked images, unmasked images, and a noising/denoising process).
- This matters because current high-quality inpainting requires massive models that are slow and expensive to run. Moebius suggests a path toward running professional-level image inpainting on consumer hardware (laptops, phones, or free-tier GPUs).