OpenAI, the organization behind the breakthrough AI chatbot ChatGPT, has recently introduced new data controls (see their Data Controls FAQ). These options, designed to give users more power over their privacy, allow for turning off chat history storage while also preventing conversations from being used to train OpenAI’s AI models. There’s a single shared toggle for both options though, which is likely not an accident in user interface design. The ability to retain chat history while also opting out of model training is intended to only be available to business customers (see Introducing ChatGPT Enterprise).
This new development presents a dilemma for individual users: either keep the useful chat history with the trade-off of it being used for model training, or disable history storage to prevent any training use. Fortunately, for those seeking both privacy and functionality, OpenAI has provided an interim solution.
OpenAI’s User Content Opt Out Request form allows users to proactively opt their ChatGPT data out of being used for model training. This means users (at the organisation level) can continue enjoying the full chat history functionality while ensuring their conversations aren’t used for model training without their explicit consent. Importantly, OpenAI has reassured users that any previous requests for opting out of training, which were submitted through customer support, will continue to be honoured.
Further, the ability to selectively opt out of AI model training aligns with users’ data privacy rights under regulations like the EU’s General Data Protection Regulation (GDPR). OpenAI will likely need to maintain some form of opt-out program to comply with data protection laws as ChatGPT expands globally.
As a (very) early adopter, active advocate, avid API user, and paid ChatGPT Plus subscriber, I don’ t feel too guilty about restricting the use of my often confidential data in this way. Indeed, for paid ChatGPT Plus subscribers, restricting the use of our data for further model training seems reasonable given we’re already financially contributing towards the operation of the service. However, free users may want to consider allowing their data to be used for training purposes, given the substantial computational costs associated with developing and running advanced AI capabilities.
OpenAI are not a charity — though it could be argued that they once looked like one — and it’s rare to find a publicly-available service that isn’t monetised through advertising or subscriptions. Bing’s GPT-4 based chat is in some ways better anyway in that it supplements the September 2021 training set with live Bing searches so you can ask it about current events like the Ukraine war that ChatGPT surprisingly knows nothing about. In any case, the situation is reminiscent of the famous Geek&Poke comic where the pigs enjoying the free barn and food are, in fact, the product themselves.
While OpenAI could change or remove this opt-out method in the future, for now, it provides individual users more granular control over their data’s training usage. It’s an interim measure that allows users to have their cake and eat it too. However, as OpenAI points out, choosing not to share data may limit the ability of their models to better address your specific use case. This is a trade-off we might have to accept for increased privacy. For instance, without sharing my voice samples, Alexa could have a hard time understanding my Aussie accent!
Another hot tip worth noting for more advanced users is that OpenAI’s API terms prohibit the use of data for model training, including when accessed via third-party tools like the excellent Poe (courtesy Quora) and TypingMind (which is available for fellow macOS users via Setapp). You do still need to be cognisant of those tools’ own privacy policies though.
In conclusion, OpenAI’s current opt-out abilities are a step towards balancing the development of AI capabilities with user privacy. But the conversation around data usage controls is far from over. As AI services continue to evolve, what do you think should be the ideal balance between user privacy and the advancement of AI capabilities?