LLM from scratch (32l) – Interventions: 更新された指示ファインチューニング結果
32層LLMのスクラッチ構築プロジェクトにおいて、指示ファインチューニングの結果を更新。介入手法の効果を検証し、モデルの性能向上を実証した最新の実験結果を報告。
32層LLMのスクラッチ構築プロジェクトにおいて、指示ファインチューニングの結果を更新。介入手法の効果を検証し、モデルの性能向上を実証した最新の実験結果を報告。
The author completed training a GPT-2-like model in 44 hours on a local machine, achieving performance close to GPT-2 small. Through systematic testing of various interventions, they identified learning rate adjustments and dropout removal as most effective for improving model loss. The author plans to next implement an LLM from scratch using JAX without reference to their book.
Updated instruction fine-tuning tests on GPT-2-style models show OpenAI's models performed best. Some custom models with similar test loss scores showed unexpected variations in instruction-following ability, with no clear pattern emerging across all tested models.