GoLongRL: Capability-Oriented Long Context RL with Multitask Alignment
GoLongRL is a reinforcement learning framework designed to improve long-context capabilities in language models through multitask alignment. It introduces a capability-oriented training approach that balances performance across different long-context tasks.