←back to Articles

Fork you and the Open Source AI Definition you rode in on…

Forking is the latest hobby horse of the Open Source Initiative (OSI), so let’s flog this dead horse too:

“An Open Source AI machine is one that the people can meaningfully fork without having to ask for permission.” — Open Source Initiative (OSI)

“Indeed! Pity many don’t grasp this very basic concept.” — OSI Board

“The current OSAID requirements may not fully enable meaningful forking, as the ability to study and modify the system is limited without access to the actual training data that shaped its behavior.” — Perplexity

Explanation by AI

I asked Perplexity to ask Claude “What is required for an AI system to be ‘meaningfully forked’, and does the OSI’s OSAID provide it?” and this is what it had so say.

It also created a public page on the topic: Requirements for Meaningfully Forking AI

To meaningfully fork an AI system, several critical components are required:

Core Requirements

Training Components

  • Complete source code for data processing and training
  • Model parameters and intermediate training states
  • Training data information, including provenance, selection methods, and processing methodologies

Data Requirements

Most experts and AI practitioners agree that access to training data is essential to meaningfully study and modify an AI system. The ability to understand and modify the system’s behavior is fundamentally tied to the data used to train it.

OSAID’s Approach

The OSAID’s position on these requirements is mixed:

What’s Included

  • Requires complete code for data processing and training
  • Mandates disclosure of model parameters
  • Requires detailed information about training data, including its source and processing methods

What’s Missing

The OSAID does not require the actual training data to be made available. Instead, it only requires “detailed information” about the data, which has been a point of controversy.

Controversy and Criticism

The decision not to require training data access has sparked debate in the open source community:

  • Critics argue this sets the bar too low and may undo decades of work in open source standards
  • Some argue this approach enables “black boxes” to escape proper scrutiny while still claiming to be open source
  • Training data is considered analogous to source code in traditional software, where access to the preferred form for modification is fundamental

The current OSAID requirements may not fully enable meaningful forking, as the ability to study and modify the system is limited without access to the actual training data that shaped its behavior.