RLVR might be disproportionately bad at science
The article suggests that reinforcement learning from verification (RLVR) could be particularly problematic for scientific progress because the verification loop for theories can take decades or centuries, and even then, better theories can sometimes make worse predictions than inferior ones.