翻訳言語

LLMsが私のコーヒーを予測する

物理実験でベンチマークを行う理由について考察。LLM（大規模言語モデル）がコーヒーの特性を予測する可能性を探り、実際の実験データとの比較を通じてAIの予測能力を評価するアプローチを提案。

LLMs can now identify public figures in images
4.5
Gemini can identify public figures in images, while ChatGPT and Claude currently do not offer this capability. This represents a functional difference between major AI models regarding image recognition of people.
Can modern LLMs actually count the number of b's in "blueberry"?
2.0
The article examines whether modern large language models can accurately count the number of 'b's in the word "blueberry," testing their ability to handle this specific adversarial question.
Updated LLM Benchmark (Gemini 3 Flash)
2.0
The article presents benchmark results for Gemini 3 Flash, comparing its performance across various tasks including reasoning, coding, and mathematics against other large language models. The updated evaluation provides insights into the model's capabilities and relative strengths in different domains.
Are LLMs not getting better?
2.0
The article examines whether large language models are actually improving, analyzing recent benchmark results and questioning if apparent progress is real or just due to test data contamination. It discusses the challenges of measuring true capability gains versus superficial improvements.

関連記事