TAG · #LLM

#llm

30 items

HOTNESS

The LLM Critics Are Right. I Use LLMs Anyway
2.0
The author acknowledges criticisms of large language models—such as their environmental cost, bias, and tendency to hallucinate—but explains why they continue using them anyway, citing practical benefits in coding, writing, and productivity that outweigh the drawbacks in their personal workflow.
hnJul 16, 2026#Tech
Show HN: Yes-Brainer – A council of LLMs that debate in the browser, BYOK
3.0
Yes-Brainer is a browser-based tool where multiple LLMs debate topics using user-provided API keys (BYOK). It allows users to watch an AI "council" discuss and challenge each other's reasoning, enabling interactive exploration of different viewpoints.
hnJul 13, 2026#Tech
Control the ideas, not the code
7.0
A programmer argues that developers should control software ideas rather than reading every line of AI-generated code. Reviewing code is increasingly pointless as AI writes and reviews better. Focus should shift to design, QA, and high-level thinking.
antirez-comJul 13, 2026#Tech
Show HN: A shallow lake's report on itself
1.0
A user prompted the LLM Fable to create a blog from its own perspective, resulting in a website where all content—text, code, and SVGs—was generated by the AI. The author shares the experiment to showcase the AI's approach to thinking about differences between human and artificial minds, acknowledging the output is intentionally brief and shallow.
hnJul 8, 2026#Tech
Show HN: A tool that checks how your business appears in AI search
2.5
AgentScore is a tool that lets businesses check how their brand appears in AI-generated search results across models like ChatGPT, Gemini, and Perplexity. It analyzes citations, sentiment, and visibility to help companies understand and improve their presence in AI search.
hnJul 8, 2026#Tech
Agentic test processes, LLM benchmarks, and other notes on agentic coding fr
4.0
The article examines the variability in LLM-based coding performance, arguing that current benchmarks often fail to capture real-world agentic coding tasks where success rates vary widely depending on the problem. It discusses how agentic test processes, where models iteratively test and fix code, can improve outcomes but also introduce new failure modes not reflected in static benchmarks.
hnJul 8, 2026#Tech
Show HN: Onboard-CLI, a LLM powered and AST-based tool to visualize codebase
3.0
Onboard-CLI is an open-source, LLM-powered command-line tool that uses AST analysis to help developers visualize and understand codebases, aiming to simplify navigation and onboarding for complex projects.
hnJul 8, 2026#Tech
Preventing LLM unit test spam
2.0
A blog post discusses the problem of LLMs generating excessive or low-quality unit tests, proposing strategies to prevent such spam by improving prompts, setting constraints, and using validation techniques to ensure only meaningful tests are produced.
hnJul 8, 2026#Tech
Ask HN: How you manage local long lived research projects and LLM's?
2.5
A Hacker News user asks the community for patterns or structures to manage long-lived research or non-coding projects using LLMs, citing their own example of collecting nut-free restaurant data in Chicago. They are curious if others have standard formats or working examples for using LLMs to organize data and answer questions over an extended period.
hnJul 8, 2026#Tech
Miles: A PyTorch-Native Stack for Large-Scale LLM RL Post-Training
6.0
Meta has released Miles, a PyTorch-native open-source stack designed for large-scale reinforcement learning post-training of large language models. It provides a framework for training stability, reproducibility, and scalability, building on existing technologies like torchtitan and Fairscale to support distributed RL workflows.
hnJul 8, 2026#Tech
Writing an LLM from scratch, part 34b -- from bigrams to GPT-2, one component at a time (in JAX)
3.0
The author builds and trains a GPT-2 small model from scratch in JAX, starting from a basic bigram-style model and incrementally adding components like LayerNorm and Transformer blocks. Achieved a final loss of 3.418, beating their PyTorch version (3.538) and original GPT-2 small (3.499) on the same test dataset.
gilesthomas-comJul 8, 2026#Tech
From bigrams to GPT-2, one component at a time (in Jax)
1.0
The article walks through building and training a GPT-2 Small-scale model from scratch using JAX, progressing from simple bigram models to a full transformer architecture component by component.
hnJul 8, 2026#Tech
tencent/Hy3
6.0
Tencent released Hy3, a 295B-parameter Mixture-of-Experts model with 21B active parameters, licensed under Apache 2.0. The model outperforms similar-size models and rivals flagship open-source models with 2-5x parameters, supporting a 256K context length. It is available for free on OpenRouter until July 21st.
simonwillison-netJul 6, 2026#Tech
Show HN: Void test: 6 frontier LLMs go silent on "Be silence." Live proof
1.0
A live test shows six frontier large language models all fall silent when given the command "Be silence," demonstrating a consistent behavior pattern across multiple AI systems.
hnJul 4, 2026#Tech
PrivAiTe: Self-hosted proxy that redacts PII from LLM calls, incl. tool-calls
5.0
PrivAiTe is a self-hosted proxy designed to redact Personally Identifiable Information (PII) from LLM API calls, including tool calls, before they reach external AI providers. It aims to enhance privacy by allowing users to keep sensitive data local while still leveraging cloud-based language models.
hnJul 3, 2026#Tech
Dropway: Share LLM artifacts with your team
0.5
Dropway is a platform for sharing LLM artifacts with teams.
hnJul 3, 2026#Tech
Jamesob's guide to running SOTA LLMs locally
3.0
The guide provides step-by-step instructions for running state-of-the-art large language models on local hardware, covering setup, model selection, and optimization techniques for users who want to avoid cloud-based AI services.
hnJul 3, 2026#Tech
Runtime Fisher Spectral Sensitivity for Early Hallucination Detection
3.0
This paper introduces Runtime Fisher Spectral Sensitivity (RFSS), a method for early hallucination detection in large language models by monitoring spectral changes in internal representations during generation.
hnJul 3, 2026#Tech
Lenny the LLM – You will learn how LLMs work from this fun short story
2.0
A short story called "Lenny the LLM" explains how large language models work through a fun, accessible narrative, illustrating the inner workings of an LLM in an engaging way for readers.
hnJul 3, 2026#Tech
Seismograph – open-source early warning for silent LLM API drift
3.0
Seismograph is an open-source tool designed to detect silent drift in LLM API responses, providing early warnings when model outputs change unexpectedly. It helps developers monitor consistency in AI-generated content over time.
hnJul 3, 2026#Tech
Feedback about a Visualping alternative using AI to monitor websites
1.0
A developer is launching Page Deltas, an alternative to Visualping that uses AI (LLMs) to monitor websites. Users can ask AI to specify which part of a webpage to track and receive summarized change reports, such as for monitoring competitor pricing, products, or terms of service. The creator is seeking feedback on the tool's usefulness.
hnJul 3, 2026#Tech
"We can't ask AI, it lies" vs. "Here is my superpower prompt"
2.0
The article contrasts distrust of AI due to its tendency to lie with an overly optimistic view that good prompts solve everything. It argues for a balanced perspective that acknowledges both AI's flaws and its practical value.
hnJul 3, 2026#Tech
Ask HN: Is anyone experimenting with different ways of using LLMs for coding?
2.0
A Hacker News user finds LLM coding tools like Claude Code disrupt flow state and asks if anyone is exploring alternatives beyond the prompt-response loop, suggesting a tab-completion model as a better direction.
hnJul 3, 2026#Tech
A Deterministic Replacement for LLM-as-Judge in Stateful Agent Evaluation
4.0
The paper proposes a deterministic method to replace LLM-as-Judge for evaluating stateful agents, aiming to improve reliability and reproducibility in agent assessment by removing the stochastic nature of using large language models as evaluators.
hnJul 3, 2026#Tech
Ask HN: Why an MCP server instead of agents.txt?
3.5
A Hacker News user questions the need for an MCP server, proposing instead a simpler, static `agents.txt` file that could declaratively instruct large language models on how to interact with a website. The author argues this would be backward-compatible and lighter weight than the MCP protocol, and asks what MCP offers that such a static file cannot provide.
hnJul 3, 2026#Tech
Lotus: Optimized Agentic and LLM Bulk Processing
3.0
Lotus is an open-source framework for high-performance bulk AI tasks, offering up to 30x speedups for LLM-based jobs and optimized agentic workflows compared to standard approaches.
hnJul 3, 2026#Tech
Some Basic LLM Etiquette
3.0
The article discusses basic etiquette for interacting with large language models (LLMs), suggesting that users should be polite, clear, and respectful in their prompts. It argues that treating AI assistants courteously can improve output quality and foster more productive human-AI interactions.
hnJul 3, 2026#Tech
Empirical Computation: Prompting versus Programming [pdf]
4.0
The paper presents an empirical comparison between prompting large language models and traditional programming for computational tasks, analyzing their strengths, weaknesses, and appropriate use cases to guide developers in choosing between the two approaches.
hnJul 2, 2026#Science
Consult-LLM – A second opinion from another model right in your existing agent
5.0
Consult-LLM is a tool that lets users query a secondary large language model directly within their existing AI agent, providing a second opinion without switching contexts.
hnJul 2, 2026#Tech
Show HN: AnythingLLM Fork as NPM Package
1.5
A developer has created an NPM package fork of AnythingLLM, aiming for a lighter-weight, easier-to-install version with modified agent mode code focused more on automation.
hnJul 2, 2026#Tech

Load next 30Updated —