译文语言

microsoft/VibeVoice

微软开源了其 Whisper 风格的语音转文本模型 VibeVoice，采用 MIT 许可，并内置了说话人分离功能。文章作者通过 mlx-audio 工具在 Mac 上使用 4-bit 量化版模型（仅占用 5.71GB）测试了一小时播客音频，处理耗时约 8 分 45 秒，峰值内存约 30-60GB。输出 JSON 包含时间戳、文本和 speaker_id，方便在 Datasette Lite 中浏览。模型目前最多支持一小时音频，超长需分段处理。

microsoft/VibeVoice

相关报道

Why Claude's new 1M context length is a big deal

PHP will get an AOT compiler from the Swoole team in 2027

LLM 0.32a0 is a major backwards-compatible refactor

Show HN: A free tool for non-technical folks to easily publish a website

Show HN: Free Live Speech Translator

microsoft/VibeVoice

相关报道

Why Claude's new 1M context length is a big deal

PHP will get an AOT compiler from the Swoole team in 2027

LLM 0.32a0 is a major backwards-compatible refactor

Show HN: A free tool for non-technical folks to easily publish a website

Show HN: Free Live Speech Translator