Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer
This paper investigates how hidden biases, such as spurious correlations learned by deep neural networks, persist even after explicit attempts to remove them through fine-tuning or retraining, revealing conditions under which sublimal learning occurs and transfers across tasks.