Optimal Transport for Machine Learners

This page provides lecture notes and resources on optimal transport theory and its applications in machine learning, covering topics like the Wasserstein distance, computational methods (Sinkhorn algorithm), and use cases in domain adaptation, generative models, and gradient flows.

Background

- Optimal Transport (OT) is a mathematical framework, rooted in the 18th-century "earth mover's problem," for finding the most efficient way to transform one probability distribution into another. It provides a way to measure distance between distributions that respects the underlying geometry (e.g., features like color or space), unlike simpler statistical divergences. - In machine learning, OT has become a powerful tool for tasks like domain adaptation (aligning a source dataset with a target dataset), generative modeling (e.g., Wasserstein GANs), and computing barycenters (averaging shapes or images while preserving geometric features). - The linked page is a tutorial by Gabriel Peyré, a leading researcher in the field. It is aimed at machine learners, covering the core theory of OT and practical algorithms (like the Sinkhorn algorithm) that make it computationally tractable for high-dimensional data. - Understanding OT requires familiarity with basic probability, linear algebra, and optimization; the page positions OT as an alternative to standard divergence measures (like KL or JS divergence) that better captures structure in data.