microsoft/VibeVoice
微软开源了其 Whisper 风格的语音转文本模型 VibeVoice,采用 MIT 许可,并内置了说话人分离功能。文章作者通过 mlx-audio 工具在 Mac 上使用 4-bit 量化版模型(仅占用 5.71GB)测试了一小时播客音频,处理耗时约 8 分 45 秒,峰值内存约 30-60GB。输出 JSON 包含时间戳、文本和 speaker_id,方便在 Datasette Lite 中浏览。模型目前最多支持一小时音频,超长需分段处理。
微软开源了其 Whisper 风格的语音转文本模型 VibeVoice,采用 MIT 许可,并内置了说话人分离功能。文章作者通过 mlx-audio 工具在 Mac 上使用 4-bit 量化版模型(仅占用 5.71GB)测试了一小时播客音频,处理耗时约 8 分 45 秒,峰值内存约 30-60GB。输出 JSON 包含时间戳、文本和 speaker_id,方便在 Datasette Lite 中浏览。模型目前最多支持一小时音频,超长需分段处理。
Anthropic has introduced a 1 million token context window for its Claude Opus 4.6 and Sonnet 4.6 models, representing a significant technical advancement. The company is offering this increased capacity without additional charges to users.
The Swoole team plans to develop an AOT (ahead-of-time) compiler for PHP, with a target release date of 2027. This compiler aims to improve PHP performance by compiling code to native machine code before execution.
LLM 0.32a0 introduces a major backwards-compatible refactor of the tool, improving its internal architecture while maintaining support for existing plugins and workflows.
Weejur is a free tool that provides a simple UI front-end for GitHub Pages, allowing non-technical users to publish websites by pasting HTML or uploading files without needing to use command-line tools or technical platforms.
A developer created a free live speech translator that uses Chrome's native APIs to translate audio from a microphone into other languages in real time within the browser.