Profiling in PyTorch (Part 2): From Nn.Linear to a Fused MLP
This blog post explores profiling techniques in PyTorch, specifically focusing on optimizing MLP layers by fusing multiple nn.Linear operations. It demonstrates how to identify performance bottlenecks using PyTorch's profiler and achieve speedups through kernel fusion, showing practical code examples and benchmarking results for fused MLP implementations.