On the Efficacy of PyTorch for High-Performance Computing

This paper evaluates PyTorch's performance in high-performance computing (HPC) contexts, analyzing its efficiency for large-scale scientific computing workloads compared to traditional HPC frameworks. It examines memory usage, computational throughput, and scalability across different hardware configurations.

Background

- This is an academic paper presented at PASC 2025, a leading conference on high-performance computing (HPC), which focuses on using supercomputers and large-scale parallel systems for science and engineering. - PyTorch is the dominant deep-learning framework developed by Meta; it is widely used for AI/ML but was designed for flexibility, not raw computational speed on traditional HPC hardware. - "High-Performance Computing" here refers to traditional scientific computing on large CPU-based clusters and supercomputers (e.g., those at national labs), as opposed to GPU-heavy AI training. - The paper investigates whether PyTorch, which emphasizes ease of use and rapid prototyping, can efficiently translate AI operations into the kind of finely tuned, parallel code that HPC applications require to run at scale — testing its suitability as a general-purpose HPC tool, not just an AI one.