TopicTracker
From HackerNewsView original
TranslationTranslation

We got 207 tok/s with Qwen3.5-27B on an RTX 3090

The Luce-Org organization achieved 207 tokens per second with the Qwen3.5-27B model running on an RTX 3090 GPU. This performance benchmark demonstrates the hardware's capabilities with the large language model.

Related stories

  • The article provides a command-line recipe for transcribing audio files on macOS using the Gemma 4 E2B model with MLX and mlx-vlm. It demonstrates the transcription of a 14-second WAV file, noting minor misinterpretations in the output.