Translation

Why it takes months to tell if new AI models are good

Evaluating new AI models takes months because standard benchmarks are unreliable and often gamed by companies. Real-world testing requires significant time and effort, while subjective "vibe checks" provide limited insight. This makes it difficult to determine if AI progress is stagnating or if models are genuinely improving.