Skip to content
TopicTracker
From HackerNewsView original
TranslationTranslation

METR can barely measure Claude Mythos – 50% task horizon now exceeds 16 hours

METR's latest evaluation finds that Claude Mythos's 50% task horizon now exceeds 16 hours, making it increasingly difficult for current benchmarks to measure the model's capabilities accurately.

Related stories

  • Gary Marcus critiques the interpretation of METR's latest "time horizon" graph, arguing that fears over rapid AI progress are misplaced. He breaks down what the data actually shows versus the overblown claims about AI taking over human tasks, emphasizing that the graph measures specific task completion times rather than general intelligence or autonomous capabilities.