Are LLMs not getting better?
The article examines whether large language models are actually improving, analyzing recent benchmark results and questioning if apparent progress is real or just due to test data contamination. It discusses the challenges of measuring true capability gains versus superficial improvements.