←back to Articles

Medical applications of Open Source AI

A woman receives a robotic massage as a scientist monitors, showcasing modern technology.

In light of how often medical applications are used by the Open Source Initiative (OSI) as an excuse for enabling vendors to conceal the source (i.e., data) under their flawed Open Source AI Definition (OSAID), I’ve updated the So, you want to write about the OSI’s Open Source AI Definition (OSAID)… article with the following section:

What are the implications for medical applications?

The OSI asks “Do you want Open Source AI built on medical data?”, and the answer is a qualified “yes”.

In addition to copyrights, medical data is generally subject to strict privacy controls which makes it inherently unsuitable for training Open Source AI models. That’s fine, because not everything has to be Open Source, and some things are better off not being Open Source. Given we know that AI models can reveal their training data, the suggestion that the training process offers adequate protection for such sensitive data is as bogus as it is dangerous, both to data subjects and to those seeking to rely on such assurances to avoid legal liability.

Anyone claiming the ability to conceal the source (i.e. data) is a feature of OSAID — or conversely that requiring the data is a limitation of more meaningful definitions — either doesn’t understand the technology or does and is deliberately deceiving you. Either way, in doing so they have proven themselves incompetent to create such a definition. Indeed, allowing the publication of medical models as Open Source without requiring the data (as OSAID does) actually impairs the industry. In any case, an Open Weight model would be published anyway with or without the Open Source moniker, so there is significant cost and no benefit in extending the definition to cover them.

Fortunately, medical data (e.g., lung cancer scans) gathered with explicit consent may be suitable for training medical models, in which case its availability under a meaningful Open Source definition that requires the release of training data will enable researchers to study and modify it as a foundation for future models. The same cannot be said for Open Weight models which do not include the training data, severely limiting the freedom to study and modify the model to e.g. fine-tuning. Furthermore, there is strong incentive for patients to provide such permissions, as the research could be used to save themselves, a loved one, or others.