This wasn’t a post I wanted to write, but Friday nights are where bad news goes to die, so here we are. It’s now been two months since I wrote the following letter to the Open Source Initiative (OSI) regarding their (then-upcoming) Open Source AI Definition (OSAID), which controversially did not require the source (i.e., data). The letter was a last-ditch effort to encourage intervention before the definition was finalised. Despite these efforts, the OSI board unanimously approved the flawed definition just days later and announced it at the All Things Open conference.
The OSAID immediately faced strong resistance across the industry (see The New Stack’s Case Against OSI’s Open Source AI Definition and my own FAQ-style opinion, So, you want to write about the OSI’s Open Source AI Definition (OSAID)…), with the shared concern that it fails to protect the four essential freedoms of Free Software: to use, study, modify and share. For AI, the data is the source, and without it, Open Source AI would lack the core freedoms that have underpinned the software ecosystem for decades. This omission risks creating a future where today’s models can’t form the foundation of tomorrow’s, and the next generation of Open Source developers—like those here at the non-profit Kwaai Open Source AI Lab, where I lead development of the Personal Artificial Intelligence Operating System (pAI-OS)—are unable to “stand on the shoulders” of previous generations for the first time in 25 years. It’s akin to the plight of modern farmers forced to rely on sterile genetically modified (GM) seeds, preventing the reuse and innovation that drive sustainable progress.
AI is transforming the software landscape, with more and more software written by or incorporating AI every day. The OSAID, which defines AI systems as software that “infers, from the input it receives, how to generate outputs,” effectively covers almost all software while conflicting with the existing Open Source Definition (despite its known limitations in completeness with respect to data including AI models — a problem for which there exists a fix that does not require a new definition). This creates a dangerous overlap of two opposing definitions, both “stewarded” by the same organisation. Such a fundamental conflict seems poised to be resolved by “harmonising” the two definitions—an outcome that risks irreparably undermining the principles of Open Source.
The following letter, written before the OSAID’s approval, highlights these concerns and calls for greater care and community involvement to prevent the internal threats now facing Open Source from the very organisations meant to protect it. Names have been omitted to focus on the issues rather than individuals, but the urgency remains unchanged.
From: Sam Johnston [email protected]
To: [Standards and Policy Director] [email protected]
Cc: [Kwaai Leadership]
Date: Friday, October 18 2024 at 4:42 PM CET
Subject: Making a connection here
[Standards and Policy Director],As the principal officer of the OSI until it started rotating recently, and someone I remember as always having fought the good fight over things like authentic open source community governance, I thought you of all people (but also the Open Source but not necessarily AI luminaries on the OSI’s board) would be absolutely horrified to see such a gratuitous redefinition of Open Source pass on your watch, just as AI and software merge (so the OSAID will become OSD 2.0 over time) — one that makes an absolute mockery of the four essential freedoms, effectively relegating FOSS to freeware.
Any artificial intelligence AND Open Source expert you ask — yes, they need to be both — will tell you this, though from what little we can learn of the OSI’s chosen (How? Why?) co-design process, the OSI now “believe that everyone is an expert based on their own lived experience, and that we all have unique and brilliant contributions to bring to a design process”. This is basically the “Do Your Own Research (DYOR)” of technical standards that are customarily determined by rough consensus. Little do they know, but the supposed beneficiaries of this new process will be victims of it too, as it can be difficult if not impossible to study and modify AI systems to effectively eliminate bias without the “source”.
It’s especially unfortunate the OSI is not even following its preferred process, which “see[s] the role of the designer as a facilitator rather than an expert“, given there’s only one voice in the room — literally, as more and more experts are censored — and the facilitator is acting as the judge, jury, and executioner. For example, they just summarily and without justification closed [community member]’s neutral proposal to achieve consensus for the 1.0 release, which is a simple bugfix building on your work, that would largely allay the fears of the objectors (including mine).
While there’s been much discussion on the subjective “preferred form”, “whatever form a developer changes to develop the program is the source code“, which means objectively the training data must be released for AI, hardly burdensome and unnecessary — the inference code itself is little more than a life support system for the model. This should not be a contentious issue, and is a reasonable compromise that in addition to open data, allows public data (like Common Crawl) while shunning proprietary and closed data — kryptonite for the Open Definition (“Open data and content can be freely used, modified, and shared by anyone for any purpose”) that will get users sued.
While not being purist by allowing only the first of four classes (open data), which could indeed relegate Open Source AI to a niche as has been noted many times, it would protect the four freedoms while following the norms of the AI industry. Bearing in mind that Open Source itself started out in a niche and grew on the strong foundation of the Open Source Definition, the argument for an even stronger definition is sound. From [community member]’s compromise position we could tweak to taste, a point [industry expert] made in his recent public appeal to the board: “I believe that we can always make the definition more permissive if we discover that we have been too ambitious with a definition, but that it is functionally impossible for us to be more ambitious later.”At this point, I think it would have been better to just pick up the phone. Following my recent presentation to Kwaai we agreed to assign the policy group to the problem, and I know other teams are working on it too in the run up to your proposed release Monday week. Those excluded from the OSI’s own censored community have regrouped outside of it, and we’ve started filing whistleblower complaints over the process (more so than the product) that led us to where we are today. Given the entire effort has been repeatedly tied to the EU’s AI Act (among others in Washington DC, California, and abroad) we’re also crowdsourcing a referral for lobbying which will be ready to submit by release day (it already runs to hundreds of pages and we’ve collected a lot more we still have to sort through).
We see the OSI’s Open Source AI Definition (OSAID) in its current form as an asteroid heading straight for our little village — the $30bn Open Source industry that 98% of enterprises and virtually all Internet, cloud, and AI services rely on — and if it can’t be course corrected, then we all owe it to our community and users to stop it.
Sincerely,
Sam
Kwaai Open Source AI Lab
Note: Though I initially considered the inclusion of public-but-not-open data as a reasonable compromise, this was due to the OSI’s shifting of the Overton (“Openton”?) window of Open Source all the way to the closed end of the spectrum. I have since joined the community consensus that accepting the second of four classes of data in the OSAID FAQ—the OSAID accepts all four, or no data at all—renders the resulting definition dysfunctional and have struck that position.
This brings us to where things stand today. A significant step in addressing these concerns has been the subsequent submission of a Form 13909 Tax-Exempt Organization Complaint (Referral) to the IRS regarding potential lobbying activities. This effort, undertaken with input from numerous community and OSI members, resulted in a comprehensive submission supported by over 500 pages of documentation, including cataloged artefacts and AI-assisted transcripts of public presentations. The final submission summary alone ran to 17 pages. To ensure protection against retaliation, the referral was filed anonymously by one participant. For context, this post highlights the challenges individuals have faced when raising concerns, though its author was not involved.
Per the IRS process, we will not receive updates or acknowledgment of the anonymous submission. This filing is a procedural step to ensure transparency, not an accusation of wrongdoing, and is now out of our hands. Transparency is crucial for nonprofit governance, particularly when activities are tied to major legislative efforts. For example, while the OSI’s EU transparency reports disclose broad lobbying activities (e.g., 2018–2023) related to the Cyber Resilience Act (CRA), the Product Liability Directive (PLD), the Digital Services Act, the Digital Markets Act, and the EU AI Act, their annual US Form 990s (2016–2022) answer “No” to Part IV, Line 4: “Section 501( c )( 3 ) organizations. Did the organization engage in lobbying activities, or have a section 501( h ) election in effect during the tax year? If ‘Yes,’ complete Schedule C, Part II”. This implies that no lobbying activities were reported in the US, raising broader questions about the consistency of lobbying disclosures globally.
It’s not for us to determine how much, if any, of the OSI’s activities under new management (the founders having been banned by them) constitute lobbying, or whether they are stretching or potentially breaking the subjective rules strictly limiting it. We are not alleging any wrongdoing by the OSI or its leadership. However, in our opinion, the majority of their activity over the past year — including the entire OSAID effort, which has been repeatedly tied on the record to federal, state, and foreign legislation such as the EU AI Act — warrants closer scrutiny by the appropriate authorities. You can form your own conclusions using tools like this simple flowchart.
Whatever happens next—which may amount to nothing, given the US is itself transitioning to new management—it’s critical to recognise that this situation is not the community’s fault. These rules are in place to protect US taxpayers by ensuring transparency and accountability in tax-exempt organisations, not to hinder the efforts of those educating and advocating for Open Source principles. As OSI members and whistleblowers, we are fulfilling our responsibility to bring potential issues to light.
The OSI itself has previously recognised the significance of these rules, having discussed them in their own blog post, FOSS Nonprofits: Judged on Their Merits at the IRS?. Notably, Open Source Software once featured on the IRS’s Be-On-the-Lookout (BOLO) list, demonstrating the scrutiny such organisations can face. The OSI has even discussed and minuted compliance concerns during recent board meetings, underscoring the importance of ensuring proper processes are followed.

Despite the inevitable doublespeak, we are the ones defending Open Source, not attacking it, and would rather see the OSI remediated than ruined. To thrive, Open Source must remain rooted in its foundational principles. Addressing the damage caused by the OSAID is a necessary step toward restoring trust and ensuring the movement’s long-term sustainability.