Qwen3.6-35B-A3B 推测解码在 RTX 3090 上呈现负收益
尽管推测解码通常能提升大语言模型的推理速度,但 Qwen3.6-35B-A3B 模型在 RTX 3090 GPU 上的实际测试表明,其性能反而下降,成为"负收益"案例。这揭示了硬件兼容性与算法优化的重要性。
尽管推测解码通常能提升大语言模型的推理速度,但 Qwen3.6-35B-A3B 模型在 RTX 3090 GPU 上的实际测试表明,其性能反而下降,成为"负收益"案例。这揭示了硬件兼容性与算法优化的重要性。
The article provides a command-line recipe for transcribing audio files on macOS using the Gemma 4 E2B model with MLX and mlx-vlm. It demonstrates the transcription of a 14-second WAV file, noting minor misinterpretations in the output.
The article explains how to package Perl and shell scripts for deployment on NixOS, covering dependency management and reproducible builds. It demonstrates creating Nix expressions to handle Perl modules and shell dependencies in the Nix ecosystem.
When working with 24-bit-per-pixel formats on video cards with bank-switched memory, code had to use aligned memory accesses despite the pixels themselves not being aligned. This requirement was necessary due to the hardware constraints of bank-switched video memory architectures.
llm-openrouter 0.6 adds a new "llm openrouter refresh" command that allows users to refresh the list of available models without waiting for cache expiration. This feature was added to enable immediate access to new models like Kimi 2.6 on OpenRouter.