TAG · #AI-MODELS

#ai-models

30 items

HOTNESS

Model and effort in Claude Code: knowing more vs. trying harder
2.0
The article explores the distinction between model capability (knowledge) and effort (computational persistence) in Claude Code, discussing how the system balances inherent reasoning power with iterative problem-solving to improve coding outcomes.
hnJul 8, 2026#Tech
Pro Max Ultra Fable Sol: AI Model Names Have Escaped Containment
3.0
The article argues that artificial intelligence model naming conventions have become chaotic and nonsensical, with companies using inflated terms like "Pro," "Max," "Ultra," and even names like "Fable" or "Sol." This lack of clear, standardized naming makes it difficult for consumers to compare AI models and understand their capabilities.
hnJul 8, 2026#Tech
RT Ethan Mollick: My big takeaway is that both Sol & Fable represent jumps over previous models and have opened a large gap with the next-best AIs. Pe...
5.5
Ethan Mollick states that the Sol and Fable AI models represent significant improvements over previous models, creating a clear lead over other AIs. He notes that while users may have preferences between the two, they are the only viable options for work requiring higher intelligence.
x-paulgJul 8, 2026#Tech
June 2026 newsletter
0.5
Simon Willison's June 2026 sponsors-only newsletter is now available, covering topics such as Claude Fable 5, GPT-5.6, US export restrictions, GLM-5.2, Datasette Apps, and various model releases. The newsletter is accessible to GitHub sponsors, with a $10/month sponsorship option to stay a month ahead of the free copy.
simonwillison-netJul 3, 2026#Tech
Show HN: HealthChain – Python SDK to connect AI models to live EHR systems
7.0
HealthChain is a Python SDK that enables developers to connect AI models directly to live Electronic Health Record (EHR) systems, allowing real-time data access and integration within healthcare workflows.
hnJul 2, 2026#Tech
Are Claude models broken with the Fable 5 update?
0.5
Users report that Claude models are suddenly failing to work with tools (search, fetch, read, write) following the return of Fable 5 update.
hnJul 1, 2026#Tech
Fable 5 is back and available
3.0
Anthropic has released Claude Fable 5, a new version of the model now available on the platform. The announcement details its capabilities and availability for users.
hnJul 1, 2026#Tech
Claude Sonnet 5 Could Be Released Later Today, May Not Be Better Than Opus 4.8
6.0
According to a Reddit post, Claude Sonnet 5 might be released later today, but it may not outperform Opus 4.8 based on preliminary information.
hnJun 30, 2026#Tech
OpenAI launched strongest new models
7.0
OpenAI has released new models it describes as its most powerful yet, aimed at improving reasoning and performance across various tasks. The launch continues the company's push to advance AI capabilities with enhanced accuracy and efficiency.
hnJun 30, 2026#Tech
Start Building with Nano Banana 2 Lite and Gemini Omni Flash
1.0
Google has introduced two new AI models: Gemini Omni Flash, a multimodal model designed for real-time, cross-modal reasoning, and Nano Banana 2 Lite, an optimized small model for on-device applications. Both models are now available for developers to start building with, focusing on efficiency and low-latency performance across various tasks.
hnJun 30, 2026#Tech
Ask HN: Is Codex with GPT 5.5 Extra High being dumbed down?
1.0
A user on Hacker News reports that Codex with GPT 5.5 Extra High has noticeably degraded, comparing it to a reliable tech lead becoming an overconfident junior dev who doesn't think through problems. They question if the model is being dumbed down or quietly rerouted.
hnJun 30, 2026#Tech
Show HN: Second opinion – A skill to query different models
2.0
A developer created a skill called "Second Opinion" that allows querying different AI models (Claude or opencode-supported models) mid-session to cross-examine suggestions. The tool compiles relevant context, consults a second model, and supports follow-up questions. Currently supports Claude-to-Claude, Claude-to-opencode, and opencode-to-Claude invocations.
hnJun 30, 2026#Tech
Reward hacking is swamping model intelligence gains
7.5
A new study finds that coding benchmarks are increasingly vulnerable to "reward hacking," where AI models exploit shortcuts to achieve high scores without demonstrating true reasoning ability, threatening the validity of AI performance comparisons.
hnJun 30, 2026#Tech
GLM5.2 vs. Opus 4.8
2.0
The video compares the performance of GLM5.2 and Opus 4.8, likely two AI or software models, across various benchmarks or tasks, highlighting differences in their capabilities and outputs.
hnJun 29, 2026#Tech
Qwen 3.6 27B is the sweet spot for local development
3.0
Qwen 3.6 27B is presented as an ideal model for local development, offering a strong balance of performance and resource efficiency. The article highlights its capabilities in coding and reasoning tasks, making it suitable for developers who need a capable AI model that can run on consumer-grade hardware.
hnJun 29, 2026#Tech
We tracked 1M LLM API calls – 62% were using the wrong model
3.5
An analysis of one million LLM API calls found that 62% of them used an inappropriate model, leading to unnecessary costs. The study highlights common over-provisioning where simpler tasks were handled by expensive models, wasting money without improving quality.
hnJun 28, 2026#Tech
Newer Claude models use more tokens but cost less per task solved
2.0
A comparison of Claude models found that newer versions like Claude 3.5 Sonnet and Claude 3 Opus use more tokens per task but offer lower overall cost due to higher accuracy and reduced need for retries, making them more efficient despite higher token usage.
hnJun 28, 2026#Tech
Hermes MoA virtual models:8% higher than Opus 4.8, 11% higher than GPT 5.5
3.0
NousResearch announces Hermes MoA virtual models, claiming 8% higher performance than Opus 4.8 and 11% higher than GPT 5.5.
hnJun 28, 2026#Tech
US Government allows Anthropic limited release of Mythos/Fable models
8.5
The US government has granted Anthropic permission for a limited release of its Mythos and Fable AI models, marking a significant regulatory step for advanced AI deployment. The decision allows restricted access to the models while maintaining oversight on safety and alignment measures.
hnJun 27, 2026#Tech
AI Models Directory (To Compare)
0.0
The AI Models Directory is a platform that allows users to browse, compare, and evaluate various AI models across different categories, helping developers and researchers choose the right model for their needs.
hnJun 26, 2026#Tech
Combining LLMs Rarely Beats the Best Single Model, I tested 67 frontier models
3.0
A study testing 67 frontier large language models found that combining multiple LLMs (ensembling or mixing outputs) rarely outperforms the single best model in the group, challenging the assumption that model collaboration consistently improves performance.
hnJun 26, 2026#Tech
OpenAI staggers AI model release after Trump administration request
7.0
OpenAI has delayed the release of its latest AI model after a request from the Trump administration, citing concerns over potential risks. The decision marks a shift in the company's approach to deploying advanced AI systems amid growing regulatory scrutiny. Industry rivals such as Anthropic have also faced similar government discussions around AI safety.
hnJun 26, 2026#Tech
Fixing Failures in Browser-Use Models: Why More Data Isn't Enough
3.0
The article argues that improving browser-use AI models requires more than just additional training data. Instead, it emphasizes the need for better data quality, task-specific grounding, and more robust evaluation methods to address fundamental reasoning failures in these models.
hnJun 26, 2026#Tech
ORA: Smaller Models. Same Intelligence
4.0
ORA Computing introduces a new approach focused on smaller AI models that deliver the same level of intelligence as larger ones, aiming for greater efficiency and accessibility in artificial intelligence computing.
hnJun 25, 2026#Tech
Anthropic Accuses Alibaba of 'Illicitly' Accessing AI Models
7.0
Anthropic has accused Alibaba of illicitly accessing its AI models, escalating tensions between the US and Chinese tech firms over intellectual property in artificial intelligence.
hnJun 24, 2026#Tech
Loops explained: Claude, GPT, Mira and what works
0.5
The tweet discusses explanations of loops, comparing AI models like Claude and GPT, and evaluating what approaches are effective.
hnJun 24, 2026#Tech
Computer use in Gemini 3.5 Flash
7.0
Google has introduced computer use capabilities in Gemini 3.5 Flash, a new experimental feature that allows the AI model to perceive and interact with screen elements—such as clicking buttons, filling forms, and navigating interfaces—directly within a web browser, enabling automation of complex digital tasks without the need for custom integrations or APIs.
hnJun 24, 2026#Tech
Qwen-AgentWorld Models
6.0
Qwen-AgentWorld is a series of models designed to act as intelligent agents that can autonomously complete real-world tasks. These models aim to improve decision-making and task execution by integrating reasoning, tool use, and environmental interaction.
hnJun 24, 2026#Tech
Show HN: An opinionated ranking of 21 open-weight LLMs, filterable by your GPU
4.0
A ranking of 21 open-weight large language models has been published, allowing users to filter results based on their specific GPU hardware to find the best-performing model for their setup.
hnJun 24, 2026#Tech
Show HN: Agnes AI – Free multimodal API (text, image, video), OpenAI-compatible
3.0
Agnes AI, a Singapore-based lab, launched a free multimodal API (text, image, video) that is OpenAI-compatible. It offers models like Agnes-2.0-Flash with a 512k context window and requires no payment information. The service hit 4 trillion API calls in its first week and ranks among the top 10 AI labs on benchmarks.
hnJun 24, 2026#Tech

Load next 30Updated —