Translation

Show HN: We trained a 32B model to beat Opus 4 at credit card optimization

Researchers trained a 32B Qwen model using GRPO reinforcement learning to optimize credit card rewards. The model achieved a score of 0.51 on held-out tasks, outperforming Opus 4 at 0.41 and GPT-4o at 0.36. The training environment is open source under Apache 2.0 license.

Show HN: We trained a 32B model to beat Opus 4 at credit card optimization

Show HN: We trained a 32B model to beat Opus 4 at credit card optimization

Related stories

Why Claude's new 1M context length is a big deal

Claude system prompts as a git timeline

Figma's woes compound with Claude Design

Claude Token Counter, now with model comparisons

Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7

Show HN: We trained a 32B model to beat Opus 4 at credit card optimization

Related stories

Why Claude's new 1M context length is a big deal

Claude system prompts as a git timeline

Figma's woes compound with Claude Design

Claude Token Counter, now with model comparisons

Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7