So, Where Does Next-Token Prediction Leave Us?
Next-token prediction has proven surprisingly effective for training capable language models, but the approach faces fundamental limitations in planning, reasoning, and factual accuracy that scaling alone may not resolve.