Scalable GANs with Transformers
The paper introduces Scalable GANs with Transformers (SGT), a new family of generative models that combines GAN training with transformer architectures. By addressing training stability issues, SGT achieves state-of-the-art image generation quality on ImageNet with fewer parameters than previous models.
Background
- GANs (Generative Adversarial Networks, 2014) pit a "generator" that creates fake images against a "discriminator" that spots fakes. Training them at high resolution is notoriously unstable.
- Transformers (the architecture behind ChatGPT) now dominate AI, but they're hard to stabilize in adversarial (GAN) training — most modern image generation instead uses diffusion models like Stable Diffusion.
- This paper introduces GT-VQGAN, a new GAN architecture built on transformer components, plus "latent consistency regularization" to keep training stable at scale.
- If it works, it could make GANs competitive with diffusion again — GANs generate images in one pass (fast), while diffusion models need many slow denoising steps.