Translation

microsoft/VibeVoice

Microsoft released VibeVoice, an MIT-licensed speech-to-text model with built-in speaker diarization. A test on a MacBook Pro transcribed one hour of audio in about 9 minutes, using up to 61.5GB of RAM. The model outputs JSON with text, timestamps, and speaker IDs, but is limited to one hour per run.

microsoft/VibeVoice

Related stories

Why Claude's new 1M context length is a big deal

PHP will get an AOT compiler from the Swoole team in 2027

LLM 0.32a0 is a major backwards-compatible refactor

Show HN: A free tool for non-technical folks to easily publish a website

Show HN: Free Live Speech Translator

microsoft/VibeVoice

Related stories

Why Claude's new 1M context length is a big deal

PHP will get an AOT compiler from the Swoole team in 2027

LLM 0.32a0 is a major backwards-compatible refactor

Show HN: A free tool for non-technical folks to easily publish a website

Show HN: Free Live Speech Translator