TAG · #GPU

#gpu

30 items

HOTNESS

Occupancy Math on the AMD MI355X GPU (CDNA4): A From-First-Principles Guide
3.0
A first-principles guide to calculating occupancy on the AMD MI355X GPU (CDNA4), covering hardware constraints like compute units, shared memory, registers, and wavefront limits, with examples and the occupancy calculator tool.
hnJul 9, 2026#Tech
Slughorn, Slug font/glyph render lib for OpenGL/OSG/Vulkan/GPU APIs (MIT lic))
1.0
Slughorn is a new MIT-licensed, GPU-agnostic font and glyph rendering library designed for use with OpenGL, OSG, Vulkan, and other GPU-driven graphics APIs. It aims to provide high-quality text rendering across multiple graphics backends.
hnJul 8, 2026#Tech
FlashAttention-4: Algorithm and Kernel Pipelining
5.0
FlashAttention-4 introduces a new algorithm and kernel pipelining design that addresses asymmetric hardware scaling, improving performance on modern GPU architectures by better managing memory and compute resources.
hnJul 3, 2026#Tech
Matrix Multiplication on Blackwell
4.0
First in a series on optimizing matrix multiplication for NVIDIA's Blackwell GPU. Covers architecture features (GB202/GB203 dies, new SM partitioning, enhanced Tensor Cores) and methodology using CUDA and low-level assembly tuning for peak performance.
hnJul 3, 2026#Tech
Understanding Latency Hiding on GPUs [pdf]
1.0
This technical report examines latency hiding techniques on Graphics Processing Units (GPUs), analyzing how GPUs hide memory and instruction latency through massive multithreading and warp scheduling. The authors present a detailed study of GPU pipeline behavior and quantify the effectiveness of different latency hiding mechanisms across various workloads.
hnJul 3, 2026#Tech
Matrix Multiplication on Blackwell
4.0
This article introduces a series on matrix multiplication optimization for NVIDIA's Blackwell architecture, explaining the importance of efficient matrix math for AI workloads and outlining the hardware advancements in Blackwell that enable faster computation compared to previous architectures.
hnJul 2, 2026#Tech
FlashAttention-4: Algorithm and Kernel Pipelining
6.0
FlashAttention-4 introduces a co-designed algorithm and kernel pipelining approach to improve attention computation on hardware with asymmetric scaling properties, enhancing efficiency and throughput on modern accelerators.
hnJul 2, 2026#Tech
Borrowing the Night: Reclaiming Idle Inference GPUs for Research
3.0
Runway introduces "Borrowing the Night," a system that repurposes idle GPU capacity from inference workloads to run research tasks overnight. By utilizing otherwise wasted compute resources, the approach aims to boost research efficiency without interfering with production inference services.
hnJul 2, 2026#Tech
Show HN: UATC – A Closed-Loop Controller to Prevent GPU OOM
4.0
UATC is a closed-loop controller designed to prevent GPU out-of-memory (OOM) errors by monitoring GPU memory usage and automatically cleaning up unused resources or adjusting allocation.
hnJul 2, 2026#Tech
Shader Benchmark for LLMs
2.0
Shader Benchmark is a tool designed to evaluate the performance of large language models (LLMs) on shader-related tasks, providing a standardized way to test and compare their capabilities in this specific domain.
hnJul 2, 2026#Tech
Nvidia Through a Crypto Miner's Eyes
1.0
A crypto miner reflects on a decade of watching Nvidia evolve from a GPU maker for gamers and miners to a trillion-dollar AI powerhouse, tracing the shift from general-purpose CUDA cores to specialized AI hardware and the cultural change from mining mania to the AI boom.
hnJul 2, 2026#Tech
Wgpu v30
3.0
Wgpu v30.0.0 has been released, bringing various updates and improvements to this graphics abstraction layer for Rust. The release includes new features, bug fixes, and changes to the API.
hnJul 1, 2026#Tech
Reduce GVisor Cold Starts with GPU Snapshotting
5.0
Cerebrium introduces a GPU memory snapshotting technique that reduces cold start times for CUDA workloads from minutes to seconds by saving and restoring GPU state across container restarts, improving efficiency for serverless GPU deployments.
hnJul 1, 2026#Tech
GPU Compute Tightness Index
4.0
Bargo AI has launched a GPU Compute Tightness Index, a metric designed to measure supply-demand dynamics and pricing pressure in the GPU cloud computing market. The index monitors real-time utilization and availability across major cloud providers to help users assess market tightness for AI workloads.
hnJun 30, 2026#Tech
Nvidia resurrects older graphics cards as RAM demands impact tech prices
3.0
Nvidia is bringing back older GPU models like the RTX 3060 to address rising RAM demands and help stabilize graphics card prices in the market.
hnJun 30, 2026#Tech
Accelerating LLM Inference on AMD GPUs with Low-Latency GEMMs
5.5
AMD introduces optimized low-latency GEMM kernels for LLM inference on AMD GPUs, reducing prefill and decode latency by up to 3.5x compared to standard implementations. The optimizations target memory-bound operations in transformer-based models, leveraging hardware-specific features to improve throughput and responsiveness in AI workloads.
hnJun 30, 2026#Tech
Investigating Linux Graphics (2025)
3.0
The article provides an in-depth investigation into the current state of Linux graphics in 2025, examining the various components, drivers, and technologies involved in rendering graphics on the platform.
hnJun 30, 2026#Tech
UATC – A Closed-Loop Controller to Prevent GPU OOM During LLM Training
4.0
UATC is a closed-loop controller designed to prevent GPU out-of-memory (OOM) errors during large language model (LLM) training by dynamically managing memory usage.
hnJun 30, 2026#Tech
Zluda 6 release (run unmodified CUDA applications on non-Nvidia GPUs)
4.0
ZLUDA 6 has been released, enabling users to run unmodified CUDA applications on non-Nvidia GPUs. This update covers work from the first two quarters of 2026 and includes improvements to compatibility and performance for running CUDA software on alternative hardware.
hnJun 30, 2026#Tech
TurboPrefill: 2.7× faster than llama.cpp Pipeline Parallel on Llama-3-70B
7.0
A new feature called TurboPrefill has been introduced to llama.cpp, achieving 2.7× speed improvement over traditional pipeline parallel processing for Llama-3-70B models.
hnJun 30, 2026#Tech
What Is Binning? A Basic Definition (2022)
0.5
Binning is a process used by manufacturers to sort chips (CPUs, GPUs, RAM) based on quality, performance, and power efficiency after production. Higher-binned parts can achieve higher clock speeds at lower voltages, while lower-binned parts are downclocked or sold as cheaper models. This practice explains why two identical-looking components can perform differently.
hnJun 30, 2026#Tech
Popping the GPU Bubble
5.0
The AI industry's GPU shortage is largely artificial, driven by hoarding and inefficient usage rather than genuine scarcity. Smaller, more efficient models and better resource scheduling could alleviate demand. The article predicts a correction where GPU prices fall and the bubble bursts, reshaping AI development priorities.
hnJun 30, 2026#Tech
WebGL Without a GPU
3.0
The article explores how WebGL rendering can be achieved without a physical GPU by using software-based implementations like SwiftShader or Google's Swiftshader, which use CPU to emulate GPU operations. It discusses the performance trade-offs, use cases for server-side rendering, and how tools like Puppeteer can leverage these setups for headless WebGL.
hnJun 29, 2026#Tech
What happens when you run a CUDA kernel?
2.0
This article explains the detailed process of launching and executing a CUDA GPU kernel, covering steps from CPU invocation and driver interaction to thread block scheduling on streaming multiprocessors (SMs), warp execution, memory access, and synchronization.
hnJun 29, 2026#Tech
2026 GPU Price Report
3.0
The 2026 GPU Price Report from CAST AI analyzes cloud GPU pricing trends, revealing significant cost variations across providers and regions. It highlights that on-demand GPU instances remain expensive, with savings possible through spot instances and reserved capacity, while also noting a shift toward newer, more efficient GPU models impacting overall pricing dynamics.
hnJun 29, 2026#Tech
Self-Destructing Graphics Cards
5.0
The article examines the persistent issue of melting power connectors in high-end graphics cards, particularly with Nvidia's RTX 4090, caused by improper cable seating, dust, or high current draw. It explains how these failures occur and offers practical safety tips, such as fully inserting connectors, avoiding sharp bends, and using quality PSU cables to reduce fire risk.
hnJun 28, 2026#Tech
What do we know about Nvidia Feynman Architecture in 2026
4.0
The Reddit post asks about what is known regarding Nvidia's "Feynman" architecture in 2026. The thread likely discusses rumored specs, performance expectations, and potential release timelines for Nvidia's next-generation GPU architecture following Blackwell.
hnJun 28, 2026#Tech
McNUFFT – Nonuniform FFT for Apple Silicon GPUs via MLX
2.0
McNUFFT is a Python library implementing the Non-Uniform Fast Fourier Transform (NUFFT) for Apple Silicon GPUs using the MLX framework. It provides both NumPy-like and PyTorch-like APIs, supporting forward and adjoint transforms as well as gradients for machine learning workflows.
hnJun 27, 2026#Tech
Ask HN: MacBook vs. Dedicated GPU for LLM
1.0
A user asks the Hacker News community to compare MacBooks versus dedicated GPUs for running large language models, and how to determine a MacBook's capacity to run a specific model.
hnJun 27, 2026#Tech
VRAM Ghost Busting: Who You Gonna Close()?
1.0
The article discusses GPU memory management for AI workloads, explaining how to efficiently allocate and deallocate VRAM (video RAM) to prevent memory fragmentation and leakage. It introduces techniques like proper CUDA caching, tensor garbage collection, and using context managers to ensure memory is freed promptly, helping developers optimize VRAM usage in machine learning applications.
hnJun 26, 2026#Tech

Load next 30Updated —