InferenceBench: A Benchmark for Open-Ended Inference Optimization by AI Agents
InferenceBench is a new benchmark designed to evaluate how well AI agents can optimize their own inference strategies for open-ended questions, moving beyond standard single-response accuracy tests by measuring cost-efficiency and iterative improvement.