The Funding Paradox at the Base of Open Training Data
Wikipedia's new commercial partnerships establish a precedent that the open AI community has not yet fully absorbed: the most important free-text corpus in existence is now a paid product for the labs that train on it most heavily . That shift matters beyond Wikipedia itself. As practitioners building on real-time AI co-pilot infrastructure and similar open-source tooling know, the quality of foundational training data propagates silently into every downstream model. A Wikipedia that loses editors because fewer readers means fewer people invested enough to contribute is a quieter, slower degradation than a licensing dispute — and harder to reverse. The big tech partners listed in the deal are, in the assessment of at least one observer, conspicuously underpaying for what they consume .