TAG · #TESTING

#testing

30 items

HOTNESS

Show HN: Openleetcode – Run LeetCode solutions locally with open tests
2.0
OpenLeetCode is an open-source tool that lets developers run LeetCode solutions locally using open, community-contributed tests, without needing a LeetCode subscription.
hnJul 3, 2026#Tech
Testing Claude Sonnet 5's agentic claims
0.5
A developer tested Claude Sonnet 5's agentic capabilities using the Puter.js framework, evaluating its ability to autonomously complete coding tasks. The article details the setup and results of these tests, examining whether the model's claimed improvements in agentic behavior hold up in practical use.
hnJul 3, 2026#Tech
Show HN: Mirrors – test AI agent changes by replaying real production traces
4.0
Mirrors is a new tool that lets developers test changes to AI agents by replaying real production traces, enabling safe validation before deployment.
hnJul 2, 2026#Tech
Battleborn Battery Fire Aftermath and More Testing [video][5 Mins]
1.0
The video shows the aftermath of a Battleborn battery fire and conducts further testing on the damaged battery, examining its condition and performance following the incident.
hnJul 2, 2026#Tech
Show HN: OSS Tests to Fix AI Gen Code. 110 Test for Major API – Supabase, Auth0
3.5
A developer open-sourced a collection of 110 tests for major APIs like Supabase and Auth0 to catch bad code generated by AI tools. The tests are based on official documentation and aim to fix common issues such as writable user metadata and exposed service role keys in client code.
hnJul 2, 2026#Tech
Help Test Bahriya – A New Distributed Container Cloud
2.0
A new distributed container cloud platform called Bahriya is being developed, and the project is seeking community help with testing. The cloud aims to provide a distributed infrastructure for running containerized workloads.
hnJul 2, 2026#Tech
React Testing Questions That Trip Up Engineers
1.0
The article covers common React testing interview questions that often challenge engineers, focusing on topics like testing hooks, async components, and mocking. It highlights practical testing strategies and pitfalls developers encounter during technical interviews for React roles.
hnJul 1, 2026#Tech
The test suite was the incident
4.0
A poor test suite change itself caused a production outage. The author argues test suites should be treated as critical infrastructure, with changes reviewed under the same rigor as production code to prevent such incidents.
hnJul 1, 2026#Tech
Show HN: Openleetcode – LeetCode runner where tests live in the repo
2.0
Openleetcode v1.0.0 is a tool that runs LeetCode-style tests directly from a repository, allowing users to execute test cases stored alongside the code rather than on the LeetCode platform.
hnJun 30, 2026#Tech
Show HN: Evaluation Context Protocol (ECP)
3.0
Evaluation Context Protocol (ECP) is a vendor-neutral, open protocol designed for portable evaluations of AI agents. It enables testing of agent outputs, tool calls, and audit context across different frameworks, models, evaluation platforms, and CI systems.
hnJun 30, 2026#Tech
Using Playwright to test my static sites
1.0
The author describes using Playwright, a browser automation tool, to write and run tests for their static websites. They explain how Playwright enables them to programmatically interact with pages, verify content, and check for broken links or rendering issues, making testing more reliable than manual checks.
hnJun 30, 2026#Tech
Show HN: Ocarina – Automate and test MCP servers from YAML, no LLM
2.0
Ocarina is a new tool that lets users automate and test MCP (Model Context Protocol) servers using YAML-based scripts called "Rondos," without requiring an LLM. It allows inspecting server capabilities, chaining tool calls, and validating outputs step-by-step, similar to Ansible playbooks. The creator built it to test MCP servers in their job and sees broader potential in the MCP ecosystem.
hnJun 27, 2026#Tech
Looking for digital nomads to test VPN with streaming services
0.5
A call is open for digital nomads to test a VPN service focused on compatibility with streaming platforms.
hnJun 27, 2026#Tech
Baguette: Headless iOS Simulator control via private SimulatorKit APIs
2.0
Baguette is an open-source tool that provides headless control over iOS Simulators via private SimulatorKit APIs, enabling automation tasks without GUI interaction on macOS.
hnJun 26, 2026#Tech
QA/Testing at Startups
0.5
Startups struggle to maintain quality when moving fast, especially with AI-generated code. Even with automated tests, critical bugs are often caught manually by customers or internal teams.
hnJun 26, 2026#Tech
Testing CIMD support across Anthropic's Claude products
0.5
The article tests whether Anthropic's Claude products support CIMD (Client-Instructed Model Deployment), examining how different Claude variants handle this feature and documenting the results across various deployment interfaces.
hnJun 26, 2026#Tech
It took two weeks to make Claude's "overnight solution" for flaky tests useful
2.5
A developer spent two weeks refining an AI-generated (Claude) script meant to fix flaky tests overnight, finding the initial solution required significant debugging and customization before it became useful in practice.
hnJun 26, 2026#Tech
Title: Show HN: AssertGo – Fluent Assertion Library for Go
2.0
The developer introduces AssertGo, a fluent assertion library for Go inspired by AssertJ, leveraging Go 1.27's generic methods to replace any-based top-level methods. The library provides chainable assertions for strings, integers, slices, maps, and types, with all design choices made by the author and code generated with Claude Sonnet in incremental commits.
hnJun 25, 2026#Tech
Show HN: TakoQA – A harness to get a swarm of agents to break your application
4.0
TakoQA is an open-source tool that uses a swarm of AI agents to automatically test applications by simulating real-world user behavior, aiming to find bugs, edge cases, and breaking points before they reach production.
hnJun 25, 2026#Tech
Show HN: Docket Fleet – mobile device cloud
5.0
Docket Fleet, a mobile device cloud from YC startup Docket (P25), launches in alpha. Built for agentic use cases like automated QA, RPA, and scraping, it offers improved UX for manual interaction. It supports iOS simulators, Android emulators, and back-end infrastructure for Windows and macOS apps, alongside HTTP tunnels for private networks.
hnJun 24, 2026#Tech
Jest/Vitest interactive course (runs in the browser)
0.0
The page offers an interactive browser-based course teaching Jest and Vitest fundamentals for testing JavaScript applications.
hnJun 24, 2026#Tech
Are AI chatbots like ChatGPT politically biased? We tested them
5.0
The Washington Post conducted tests on major AI chatbots, including ChatGPT, to assess political bias. The tests found that most chatbots exhibited a left-leaning bias on a range of political and social topics. The results raise questions about the neutrality of AI systems that are increasingly used for information.
hnJun 24, 2026#Tech
BigQuery Emulator (Bqemulator)
2.0
Bqemulator is a community-maintained, open-source emulator for Google BigQuery that allows developers to run and test SQL queries locally without needing a cloud connection. It supports a range of BigQuery features including standard SQL syntax, views, user-defined functions, and transactional operations while providing fast feedback for development workflows.
hnJun 24, 2026#Tech
Are ChatGPT and other AI chatbots politically biased? We tested them
6.5
The Washington Post tested six major AI chatbots and found they consistently produce responses that lean left on a wide range of political topics, from abortion and gun rights to immigration and climate change, regardless of how questions were phrased.
hnJun 24, 2026#Tech
OpenUser: Self-hosted user-persona tester for AI coding agents
2.5
OpenUser is a self-hosted, open-source tool that lets AI coding agents autonomously test features by simulating user personas in a browser, collecting console logs, network logs, and checkpoints. It runs locally with any coding agent and model, designed to replace manual testing in development loops.
hnJun 23, 2026#Tech
Find the questions your RAG pipeline will fail on, before your users do
1.0
ragProbe is a tool designed to identify questions that a RAG (Retrieval-Augmented Generation) pipeline will answer incorrectly, allowing developers to detect failures before users encounter them.
hnJun 23, 2026#Tech
What Breaks When You Skip the Harness
2.0
The article discusses the risks and consequences of skipping the "harness" (testing infrastructure) in software development, explaining how this shortcut can lead to undetected bugs, difficult debugging, and fragile code that breaks unpredictably in production.
hnJun 23, 2026#Tech
Show HN: A local rig to test if AI social simulation predicts reality
4.0
A developer created Mirofish, a local testing rig that evaluates how well AI social simulations predict real-world outcomes. The tool allows users to run controlled experiments comparing simulated AI behavior against actual results, aiming to improve the reliability of AI-driven social models.
hnJun 22, 2026#Tech
DisplayMate
1.0
DisplayMate is a website that provides expert reviews and analysis of display technologies, including TVs, monitors, tablets, and smartphones. It specializes in in-depth testing and calibration of screen performance, covering factors like brightness, color accuracy, and contrast. The site is known for its detailed technical evaluations and comparison charts.
hnJun 22, 2026#Tech
Show HN: ZeroDrop – Disposable email inboxes for CI pipelines (no Docker)
2.0
ZeroDrop provides disposable email inboxes designed for CI/CD pipelines, enabling automated email testing without Docker. The service creates temporary email addresses to verify sign-up flows, password resets, and other email-based processes in integration tests.
hnJun 22, 2026#Tech

Load next 30Updated —