Skip to content

话TopicTracker

趋势分类关于

Loading deep-dive…

© 2026 TopicTracker

关于条款隐私

来自 HackerNews查看原文 ↗

译文语言译文语言

METR 几乎无法衡量 Claude Mythos——50% 任务时限现已超过 16 小时

根据 Hugonomy 报道，METR（衡量与评估任务基准）在评估 Claude Mythos 时遇到困难，因为该模型 50% 的任务时限已超过 16 小时。这一指标表明 Claude Mythos 在长时间自主任务执行方面表现出色，远超以往模型的测评能力范围。

相关报道

Misplaced panic over AI progress
4.0
Gary Marcus critiques the interpretation of METR's latest "time horizon" graph, arguing that fears over rapid AI progress are misplaced. He breaks down what the data actually shows versus the overblown claims about AI taking over human tasks, emphasizing that the graph measures specific task completion times rather than general intelligence or autonomous capabilities.