GoLongRL: Capability-Oriented Long Context RL with Multitask Alignment
GoLongRL is a capability-oriented long-context reinforcement learning framework designed to improve language models' ability to handle long sequences. It introduces multitask alignment strategies to ensure balanced performance across diverse long-context tasks, addressing the challenge of task-specific degradation in existing long-context training approaches.