Translation

METR can barely measure Claude Mythos – 50% task horizon now exceeds 16 hours

METR's latest evaluation finds that Claude Mythos's 50% task horizon now exceeds 16 hours, making it increasingly difficult for current benchmarks to measure the model's capabilities accurately.

Misplaced panic over AI progress
4.0
Gary Marcus critiques the interpretation of METR's latest "time horizon" graph, arguing that fears over rapid AI progress are misplaced. He breaks down what the data actually shows versus the overblown claims about AI taking over human tasks, emphasizing that the graph measures specific task completion times rather than general intelligence or autonomous capabilities.

METR can barely measure Claude Mythos – 50% task horizon now exceeds 16 hours

Related stories

Misplaced panic over AI progress