大语言模型的高斯分布权重
本文探讨了NF4等4位浮点格式及其更高精度变体,这些是bitsandbytes库中常见的4位数据类型。当从Hugging Face下载量化到4位的LLM权重时,它们可能采用NF4或FP4格式。
本文探讨了NF4等4位浮点格式及其更高精度变体,这些是bitsandbytes库中常见的4位数据类型。当从Hugging Face下载量化到4位的LLM权重时,它们可能采用NF4或FP4格式。
FP4 is a 4-bit floating point format that uses 1 sign bit, 2 exponent bits, and 1 mantissa bit. It has limited precision and dynamic range, making it suitable for specialized applications like AI inference where memory bandwidth is constrained.
USearch introduces FP8 (8-bit floating point) support for vector search and KV-caching, enabling more efficient memory usage and faster computations. The implementation allows for reduced storage requirements while maintaining search accuracy through quantization techniques.