翻訳言語

Show HN: 32Bモデルを訓練し、クレジットカード最適化でOpus 4を上回る成果を達成

クレジットカード報酬最適化のためのRL環境を構築し、Qwen 32BをGRPOで訓練しました。訓練済みモデルは保留タスクで約0.51のスコアを獲得し、Opus 4の約0.41やGPT-4oの0.36を上回りました。環境はApache 2.0でオープンソース化されています。

Why Claude's new 1M context length is a big deal
7.5
Anthropic has introduced a 1 million token context window for its Claude Opus 4.6 and Sonnet 4.6 models, representing a significant technical advancement. The company is offering this increased capacity without additional charges to users.
Claude system prompts as a git timeline
3.0
Anthropic publishes Claude system prompts as Markdown, which were converted into separate files with fake git commit dates to enable browsing changes via GitHub. This allowed for detailed analysis of changes between model versions like Opus 4.6 and 4.7.
Figma's woes compound with Claude Design
3.0
Figma's dependence on non-designer seats made it particularly vulnerable to AI disruption. The launch of Claude Design further exacerbates this challenge for the company.
Claude Token Counter, now with model comparisons
2.0
The Claude Token Counter tool has been upgraded to compare token counts across different Claude models. Opus 4.7 uses an updated tokenizer that increases token counts by 1.46x for text and up to 3.01x for images compared to Opus 4.6, potentially making it about 40% more expensive despite identical pricing.
Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7
2.0
The author tested Qwen3.6-35B-A3B and Claude Opus 4.7 on a "pelican riding a bicycle" benchmark. Qwen3.6 produced a better SVG illustration with a correct bicycle frame, while Opus 4.7 failed to properly render the bicycle frame. The humorous benchmark has generally correlated with model usefulness.

Why Claude's new 1M context length is a big deal

7.5

Anthropic has introduced a 1 million token context window for its Claude Opus 4.6 and Sonnet 4.6 models, representing a significant technical advancement. The company is offering this increased capacity without additional charges to users.

Claude system prompts as a git timeline

3.0

Anthropic publishes Claude system prompts as Markdown, which were converted into separate files with fake git commit dates to enable browsing changes via GitHub. This allowed for detailed analysis of changes between model versions like Opus 4.6 and 4.7.

Figma's woes compound with Claude Design

3.0

Figma's dependence on non-designer seats made it particularly vulnerable to AI disruption. The launch of Claude Design further exacerbates this challenge for the company.

Claude Token Counter, now with model comparisons

2.0

The Claude Token Counter tool has been upgraded to compare token counts across different Claude models. Opus 4.7 uses an updated tokenizer that increases token counts by 1.46x for text and up to 3.01x for images compared to Opus 4.6, potentially making it about 40% more expensive despite identical pricing.

Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7

2.0

The author tested Qwen3.6-35B-A3B and Claude Opus 4.7 on a "pelican riding a bicycle" benchmark. Qwen3.6 produced a better SVG illustration with a correct bicycle frame, while Opus 4.7 failed to properly render the bicycle frame. The humorous benchmark has generally correlated with model usefulness.

Show HN: 32Bモデルを訓練し、クレジットカード最適化でOpus 4を上回る成果を達成

関連記事

Why Claude's new 1M context length is a big deal

Claude system prompts as a git timeline

Figma's woes compound with Claude Design

Claude Token Counter, now with model comparisons

Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7

Show HN: 32Bモデルを訓練し、クレジットカード最適化でOpus 4を上回る成果を達成

関連記事

Why Claude's new 1M context length is a big deal

Claude system prompts as a git timeline

Figma's woes compound with Claude Design

Claude Token Counter, now with model comparisons

Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7