RFC:面向开源的人工贡献者
本文以最佳当前实践(Best Current Practice)的形式,探讨了人工智能贡献者参与开源项目的问题。文章提出了相关标准与指南,旨在规范AI在开源协作中的角色定位,确保项目健康发展的同时兼顾自动化与社区治理的平衡。
本文以最佳当前实践(Best Current Practice)的形式,探讨了人工智能贡献者参与开源项目的问题。文章提出了相关标准与指南,旨在规范AI在开源协作中的角色定位,确保项目健康发展的同时兼顾自动化与社区治理的平衡。
InferenceBench is a new benchmark designed to evaluate how well AI agents can optimize their own inference strategies for open-ended questions, moving beyond standard single-response accuracy tests by measuring cost-efficiency and iterative improvement.
MemEye is a visual-centric evaluation framework designed to assess the memory capabilities of multimodal AI agents by testing how well they retain and recall visual information across interactions. The framework provides a structured benchmark to measure an agent's ability to remember past visual inputs, supporting research into more persistent and context-aware multimodal AI systems.