Interpretable Coreference Resolution Evaluation Using Explicit Semantics
A new interpretable evaluation framework for coreference resolution uses explicit semantics rather than span-matching metrics. It assesses coreference link properties for transparent diagnostics and correlates with human judgments.
Background
- Coreference resolution is a natural language processing (NLP) task where systems try to figure out which words in a text refer to the same real-world entity (e.g., linking "she" and "Mary" or "the car" and "it").
- This paper presents a new evaluation method for coreference resolution that goes beyond simple binary scores (right/wrong) by using explicit semantic analysis — essentially checking whether the system correctly understands the meaning of the referents, not just their surface form.
- The work was accepted at ACL 2026, the top academic conference in computational linguistics, so it represents a significant methodological contribution to the field.
- Most existing evaluation metrics for this task are opaque: they give a single score (e.g., MUC, B3, CEAF) without explaining what kinds of mistakes the system made. This work aims to make evaluation more interpretable by decomposing performance along semantic dimensions (e.g., gender, number, animacy, or more fine-grained categories).