←back to Articles

The OSI lacks competence to define Open Source AI

The Open Source Initiative (OSI) has stepped up the censorship of fellow advocates for the four essential freedoms (to use, study, modify, and share software), moderating, silencing, and threatening permanent community bans to proponents of an Open Source AI definition (OSAID) that fully protects said freedoms by requiring users have access to the “source” of AI models: training datasets.

In response to yesterday’s post disclosing their proposal’s critical security issues (Turning Open Source AI Insecurity up to 11 with OSI’s RC1), their community manager stepped up the authoritarianism, demanding contributors respect the Community Guidelines or appropriate actions will be taken. This is never a good sign in a purportedly open (i.e., open-washed) process being run by a non-profit organisation with “open” in its name, but as you’ll see, the reality is even more troublesome.

Founding Father

It seems a now successfully silenced community is the best community though, especially when you’re trying to ram through a redefinition of a quarter century of Open Source that everybody else hates, including the original author of the Open Source Definition (OSD) and founder of the OSI, Bruce Perens:

I am the creator of the original Open Source Definition, and support Amanda’s view that the Open Source AI Definition is flawed, that OSI hasn’t done a great job and so weren’t necessarily the best team to do this. In my opinion, the result is less than Open Source.

I have supported Amanda’s views for a year, since she walked off of the OSI board, and told me she felt they were going about it wrong and nobody would listen.

You can apply the original Open Source Definition to machine learning and I feel it works better. You need to apply it to two pieces: the software (and possibly specialized hardware), and the training data. The training data is “source code” for the purposes of the Open Source definition. This is complicated because some models evolve with each question asked, etc., and thus the training data is ever changing and becomes very large.

I have a different approach to machine learning for Post Open, see https://postopen.org/ and the zero-cost license there. This is evolving and your input is appreciated.

The hubris it takes for transitional custodians who are obviously way out of their depth to override the creator of the very definition they seek to rewrite is truly remarkable. The only reason I can imagine there’s not more outrage is that the wider Open Source community haven’t yet realised the true significance of it — and may not until it’s too late with a 1.0 launch due on 28 October 2024 at the All Things Open conference.

More and more software is being written by and incorporating AI. It’s eating software, which is eating the world. It’s likely that soon the two will be indistinguishable, and the only standard that will matter is the Open Source AI definition that effectively neuters Open Source, turning it into Freeware. It’s like the community’s worst enemy has been given the keys to the castle by a temporary guard, and it makes one wonder which side they’re on.

Foxes Guarding the Henhouse

Curious as to what someone with a history of concise, relevant posts — the last being “No hay peor sordo que el que no quiere oír” (“There is no worse deaf person than the one who does not want to hear”, an apt analogy given the OSI’s belligerently selective hearing of late) — could possibly have said to warrant the censor’s blade (“This post was flagged by the community and is temporarily hidden.“) in a response to my Squaring the Circle of Open Source AI without Data post, I contacted her to find out. It turns out she had posted this reply to another community member:

OSD cannot be defined by people who could not build software. OSAID cannot be defined by people who could not build AI.

To be fair, OSI has acknowledged its ignorance on the subject many times and has summoned various experts, including from Google, Meta, and Microsoft, to help co-design OSAID.

Which is a bit like 1998 Perens and ESR asking Balmer’s Microsoft how to define open source, but without knowing how to program themselves.

I’m not going to reveal the content of private discussions, but they’re not happy and the board owes another apology, especially as they’ve recently seen fit to moderate discussions themselves. The first, for publicly shaming a critic and their employer on a professional network, remains outstanding. Now they’re making an absolute mockery of the co-design process claiming to give “special attention […] to diversity, equity and inclusion” by making an under-represented minority (a woman) responding to another under-represented minority (BIPOC) feel unwelcome because they didn’t restrict themselves to OSI-Approved talking points.

Apparently, said censor has never heard of the Streisand effect (“an unintended consequence of attempts to hide, remove, or censor information, where the effort instead increases public awareness of the information”), as this immediately led me to investigate their claim.

Blind Leading the Blind

I was shocked to discover (with the help of AI research & reasoning engine Perplexity) that not a single person involved in the governance of the process even claims to have expertise in AI:

Do any of the Open Source Initiative (OSI) team or board claim expertise in Artificial Intelligence?

The Open Source Initiative (OSI), a prominent organization in the open source software community, does not appear to have team or board members with explicitly stated expertise in Artificial Intelligence based on their publicly available profiles. While the OSI’s leadership possesses diverse backgrounds in open source software, licensing, and community building, their biographies do not specifically highlight AI-related experience or qualifications.
[…]
AI Expertise Absence

Despite the diverse backgrounds and extensive experience in open source software, licensing, and technology among the OSI team and board members, there is no explicit mention of Artificial Intelligence expertise in their publicly available profiles. This absence of AI-specific qualifications is notable given the increasing importance of AI in the tech industry. […]

Thus, the charge of admitted ignorance and analogy of two white males (since apparently demographics are now relevant to the technical question of what collection of components fully protects the four essential freedoms) asking a third — who infamously claimed Linux is cancer — to write the definition because they lack standing are both apt and hardly worthy of censorship that is in any case unsupported by the Community Guidelines.

Smoke-Filled Back Rooms

Recall that my recent audit of the voting data still used publicly by the OSI to justify the decision to exclude training datasets revealed that not only was the superpower to issue vote-cancelling negative votes granted, but it was selectively granted only to the working group studying Meta’s Llama models. For the four(!) products sampled — from a population of over 1,000,000 free models and 200,000 free datasets on Hugging Face alone — those products’ employees were invited to participate and vote on the basis that only they had enough knowledge to assess them. “A sheep that invites a wolf to dinner shouldn’t be surprised when it becomes the meal”, and this was exactly the same as handing over the reins to Ballmer’s Microsoft.

On review of detailed voting records, one of Meta’s lawyers was found to have used that superpower against every component in the data category, including training datasets, wiping out votes from other working groups. My subsequent statistical analysis eliminating that democratic distortion predictably found that training datasets must be required by the definition. The OSI apologised only to maintain the faulty decision, its executive director later claiming that “those results were never meant neither to be scientific, nor representative, nor democratic or anything like that.”

The entire “co-design” process is an irretrievably tainted sunk cost and has been described as “a master piece of corporate capture” in a now-silenced community member’s complaint:

Process: The co-design process as conducted was unclear to participants and its vulnerabilities have been exploited by Meta.

Implications

The OSI should never have taken on the challenge of redefining Open Source, answering a question nobody asked, at least in its current composition. They’ve never done anything like this before as the original Open Source Definition (OSD) predates their incorporation and was a rebadged version of the Debian Free Software Guidelines (DFSG). Such an important standard was not the place to start experimenting. When then faced with sustained opposition from “many smart and reasonable individuals with decades of open source experience [who] regard this as insufficient”, they should have immediately stopped, conducted a review, and worked to find rough consensus, as is typically the case for Internet standards. It’s not too late for them to do that, even in time for their own self-imposed 28 October deadline, but I’m not holding my breath.

The OSI is yet to address cloud computing after almost 20 years despite that community’s invitation (resulting in the formation of the Open Cloud Initiative), and the sudden urgency to influence AI is apparently not being driven by the needs of Open Source users nor projects like ours, but rather jeopardising their tax-exempt status with forbidden political lobbying in relation to foreign law, the EU AI Act, and its equivalents in Washington DC and California (a sufficiently important topic that it deserves dedicated investigation in subsequent articles). Furthermore, one of their own board alumni has just called into question whether they even have the mandate of their own membership. One really does wonder what could possibly drive them to this Kamikaze mission.

Subject Matter Experts?

Lacking competence isn’t terminal in that it can be partially compensated for by engaging subject matter experts (i.e., educated and experienced AI practitioners), but only to the extent the board is then capable of digesting and deciding on the results, which it yet to be seen with a vote expected any day. However, this should have been done with a narrow technical focus, answering the question “What components are required to guarantee the protection of the four essential freedoms for AI?“.

Despite drowning in volunteer experts (or maybe because of it), the board decided to outsource its founding document to a contractor also lacking AI expertise, rather being an expert in the “co-design” methodology which “addresses the challenges of reaching an agreed definition within a diverse community (Costanza-Chock, 2020: Escobar, 2018: Creative Reaction Lab, 2018: Friedman et al., 2019)” — a job I’m sure they do very well in the right context. While I have no doubt it’s useful for improving the quality of AI applications like bank loan approvals and airport scanners (having just studied AI, Ethics, & Society for my MS in ML), it’s less obvious what concepts like #TravelingWhileTrans and the Black feminist “Matrix of Domination (white supremacist heteropatriarchy, ableism, capitalism, and settler colonialism)” have to do with drafting a boring technical standard on which so many lives and livelihoods depend.

Recall that the Open Source Definition (OSD) was published in 1998 as a rebadged version of the Debian Free Software Guidelines (DFSG) from 1997, objectively determined from what was deemed technically necessary by software engineers to demonstrably protect the four essential freedoms stemming from the Free Software Definition (FSD), itself dating back to 1986. Absolutely no [re]definition whatsoever was called for, rather a simple, objective technical mapping from source (for software) to data (for AI). Or alternatively, the application of the unmodified Open Source Definition to the code and data per the original author’s instructions above. How this turned into thousands of hours of paid & volunteer work is beyond me.

Interestingly, “the community adopted the four freedoms for software”, only to then inexplicably decree that they needed to be “adapted for AI systems”. If this is setting off alarm bells for you, that’s because it should. These four simple, clear, well-accepted freedoms copied directly from the release candidate (which goes on to not protect them) are either protected or not and do not need to be “adapted” in any way:

  • Use the system for any purpose and without having to ask for permission.
  • Study how the system works and inspect its components.
  • Modify the system for any purpose, including to change its output.
  • Share the system for others to use with or without modifications, for any purpose.

Proposal for Consensus

What it means to be “Open Source” was settled by Bruce Perens et al over a quarter century ago, carving out space for projects like our own Personal Artificial Intelligence Operating System (pAI-OS), which was initiated by Kwaai (an Open Source AI Lab) around the same time as this process. Given an ineffective and unimplementable Open Source AI definition risks our being crowded out by opaque open-washed offerings from commercial vendors, it presents an existential threat to us, which is why we are determined to see a more reasonable compromise reached. Kwaai time I spend doing this, I’m not doing that, so I hope sanity prevails soon with a workable compromise rather than this capitulation to corporate interests.

A community leader’s proposal to achieve consensus for the 1.0 release, hosted in an open Github repo at OSAID-WIP (the OSI’s own repo is private, which is telling in itself) neatly solves the issue by eliminating the two most problematic classes of data: “obtainable” proprietary data “including for fee” which risks users being sued by custodians like NYT, and “non-public” data like Facebook’s social graph not available for any amount of money, which is like specifying unicorn horns in a recipe.

It’s time to put all this mess behind us and seriously consider adopting this compromise as a community, with or without the blessing of the OSI whose opinion, in my opinion, isn’t worth the “almost 3 years” and thousands of hours they’ve spent coming up with it. The OSI are already pointing at the vanity metric of unnamed “endorsers” as reason not to negotiate, but if they can’t be convinced, they can be circumvented.