Writing an LLM from scratch, part 32g -- Interventions: weight tying
The article examines weight tying in LLMs, a technique that reduces parameters by sharing weights between input and output layers. The author tests this approach on a GPT-2 style model to see if it improves performance, despite research suggesting it typically worsens model quality.