Show HN: 32Bモデルを訓練し、クレジットカード最適化でOpus 4を上回る成果を達成
クレジットカード報酬最適化のためのRL環境を構築し、Qwen 32BをGRPOで訓練しました。訓練済みモデルは保留タスクで約0.51のスコアを獲得し、Opus 4の約0.41やGPT-4oの0.36を上回りました。環境はApache 2.0でオープンソース化されています。
クレジットカード報酬最適化のためのRL環境を構築し、Qwen 32BをGRPOで訓練しました。訓練済みモデルは保留タスクで約0.51のスコアを獲得し、Opus 4の約0.41やGPT-4oの0.36を上回りました。環境はApache 2.0でオープンソース化されています。
Anthropic has introduced a 1 million token context window for its Claude Opus 4.6 and Sonnet 4.6 models, representing a significant technical advancement. The company is offering this increased capacity without additional charges to users.
Anthropic publishes Claude system prompts as Markdown, which were converted into separate files with fake git commit dates to enable browsing changes via GitHub. This allowed for detailed analysis of changes between model versions like Opus 4.6 and 4.7.
Figma's dependence on non-designer seats made it particularly vulnerable to AI disruption. The launch of Claude Design further exacerbates this challenge for the company.
The Claude Token Counter tool has been upgraded to compare token counts across different Claude models. Opus 4.7 uses an updated tokenizer that increases token counts by 1.46x for text and up to 3.01x for images compared to Opus 4.6, potentially making it about 40% more expensive despite identical pricing.
The author tested Qwen3.6-35B-A3B and Claude Opus 4.7 on a "pelican riding a bicycle" benchmark. Qwen3.6 produced a better SVG illustration with a correct bicycle frame, while Opus 4.7 failed to properly render the bicycle frame. The humorous benchmark has generally correlated with model usefulness.