LLMs预测我的咖啡
作者探讨了使用大型语言模型预测咖啡冲泡结果的可能性,并思考为何不通过物理实验进行基准测试,而是尝试用AI来预测咖啡的味道和品质。
作者探讨了使用大型语言模型预测咖啡冲泡结果的可能性,并思考为何不通过物理实验进行基准测试,而是尝试用AI来预测咖啡的味道和品质。
Gemini can identify public figures in images, while ChatGPT and Claude currently do not offer this capability. This represents a functional difference between major AI models regarding image recognition of people.
The article examines whether modern large language models can accurately count the number of 'b's in the word "blueberry," testing their ability to handle this specific adversarial question.
The article presents benchmark results for Gemini 3 Flash, comparing its performance across various tasks including reasoning, coding, and mathematics against other large language models. The updated evaluation provides insights into the model's capabilities and relative strengths in different domains.
The article examines whether large language models are actually improving, analyzing recent benchmark results and questioning if apparent progress is real or just due to test data contamination. It discusses the challenges of measuring true capability gains versus superficial improvements.