A Good Accuracy Rate Is the Wrong Metric at This Scale
The Oumi analysis exposes a category error in how accuracy is conventionally reported for AI search features. Ninety-one percent correct sounds like a high standard; applied to tens of millions of incorrect AI Overview answers generated every hour, it becomes an argument for the severity of the problem rather than its manageability. The wrong answers do not distribute randomly across low-stakes queries — they appear with identical confidence and identical placement to the correct ones, giving users no signal to distinguish them. Google's design treats accuracy as a product metric; the Oumi findings suggest it should be treated as an epidemiological one.