Translation

Token Entanglement in Subliminal Learning

The article explores how large language models can learn subliminal patterns in token sequences—where individual tokens appear random but their relationships encode hidden information—and how this "token entanglement" affects model behavior, training, and safety.

Background

"Owls" (or "OWLS") is a research project from the Bau Lab at Northeastern University exploring how large language models (LLMs) encode information at the token level. This site accompanies a paper on "subliminal learning" — cases where models pick up patterns they aren't explicitly trained on. "Token entanglement" refers to the finding that LLMs sometimes blend or interfere representations of different tokens (words/subwords) in ways that affect behavior. The Bau Lab focuses on mechanistic interpretability (reverse-engineering neural network internals) and model editing. This work matters because understanding how models internally represent and confuse concepts helps researchers build safer, more controllable AI systems.