The Incomplete-Data Trap That Lab Benchmarks Conceal
The JAMA study's core finding reframes the AI diagnostics conversation in a way that performance leaderboards cannot absorb: the failure is structural, not marginal. When patient information is complete, frontier models perform well — some exceed 90% final-diagnosis accuracy. But complete patient data is a condition that describes late-stage clinical encounters, not the triage moment when a decision about urgency actually matters. Every model tested failed that earlier, harder task more than 80% of the time .
This is the trap that benchmark results obscuring real triage failure rates allow hospitals to walk into. A vendor can present a 90%-accuracy headline without lying — it simply selects the condition under which the number holds. Procurement teams that do not ask 'accuracy under what information conditions?' are buying a product whose failure mode is most acute precisely when clinical stakes are highest. The JAMA study names the question procurement has not been asking.