Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Trac
The authors argue against anthropomorphizing intermediate tokens in LLMs (often called "reasoning" or "thinking" traces), warning that such framing leads to misleading interpretations of model behavior and over-attribution of human-like cognitive processes to what are fundamentally statistical computations during token generation.
Background
- This paper critiques the widespread practice in AI research of treating a large language model's (LLM) internal "chain-of-thought" or intermediate tokens as though they were conscious reasoning or thinking traces — essentially, anthropomorphizing them.
- The authors argue that these tokens are just learned computational patterns, not evidence of genuine reasoning, self-awareness, or introspection, and that framing them as such misleads researchers and the public about what LLMs are actually doing.
- This matters because the AI field is increasingly using terms like "reasoning" and "thinking" for models (e.g., OpenAI's o1 or DeepSeek-R1), and the debate over whether LLMs truly "reason" has real stakes for safety, interpretability, and regulation — if we misunderstand what we're building, we misjudge its risks and capabilities.