Five frontier LLMs disagree on 67% of 1k real-world fact-check claims
A study evaluating five frontier large language models (LLMs) on 1,000 real-world fact-checking claims found that the models disagreed on 67% of the claims. This high level of disagreement highlights significant inconsistencies in how different LLMs assess factual accuracy, raising concerns about their reliability for automated fact-checking.