TopicTracker
From johndcook.comView original
TranslationTranslation

Gaussian distributed weights for LLMs

This post examines the NF4 4-bit floating point format and higher precision analogs used for quantizing LLM weights. NF4 and FP4 are common 4-bit data types in bitsandbytes, often found in weights downloaded from Hugging Face.

Related stories

  • FP4 is a 4-bit floating point format that uses 1 sign bit, 2 exponent bits, and 1 mantissa bit. It has limited precision and dynamic range, making it suitable for specialized applications like AI inference where memory bandwidth is constrained.

  • USearch introduces FP8 (8-bit floating point) support for vector search and KV-caching, enabling more efficient memory usage and faster computations. The implementation allows for reduced storage requirements while maintaining search accuracy through quantization techniques.