AI Privacy: The Complete Guide to Protecting Your Data
AI privacy: you don't control what a provider does with your data, but you control what you send it. The complete guide, risk by risk, with the fix that holds.
AI privacy comes down to one sentence: you don't control what a provider (OpenAI, Anthropic, Google, Meta) does with your data, but you control what you send it. By default, several consumer services train their models on your conversations, keep them for a long time and can have them read by humans; opt-out settings help but never cover everything. So the only robust protection, valid across every tool, is to minimize and anonymize sensitive information before pasting it into a prompt. The CNIL recommends exactly this: don't share confidential information in a consumer AI service. This guide takes the full tour — what is done with your data, the risks, the protections, and what to do by profile.
What AI does with your data by default
On consumer accounts, three processes often coexist by default unless you explicitly opt out. First, training: your conversations are used to improve the models. On August 28, 2025, Anthropic changed its consumer terms (Claude Free, Pro, Max and Claude Code) to use chats and coding sessions for training unless the user declines, with a choice to make before September 28, 2025. Second, retention: under that same change, retention extends to five years for anyone who doesn't opt out, whereas prompts and responses were previously generally deleted within 30 days; enterprise offerings (Claude for Work, for Education, Gov, API) are not affected. Third, human review: flagged conversations (abuse, illegal content, risk of harm) can be escalated to human reviewers — OpenAI says it analyzes conversations to detect a threat of imminent physical harm, which can be routed to reviewers and, where warranted, to law enforcement. We break each of these down in our articles “Does AI train on your data?” and “What personal data does ChatGPT collect about you?”.
The real risks: leaks, indexing, litigation, GDPR
Beyond settings, four concrete risks are documented. Sharing conversations can create public exposure: in 2025, ChatGPT conversations shared via a “make discoverable” option ended up indexed by search engines like Google; some contained names, résumés and details that allowed identification via LinkedIn. After these reports, OpenAI removed the feature, its spokesperson explaining that it “introduced too many opportunities to accidentally share things you didn't mean to.” Content you type can also become evidence producible in litigation: in the New York Times v. OpenAI case, a May 2025 decision forced OpenAI to preserve ChatGPT conversation logs — including those users had deleted — and in November 2025 the court ordered the production of 20 million de-identified logs (see OpenAI's note “response to NYT data demands”).
- Employee leaks: in April 2023, Samsung engineers pasted proprietary semiconductor source code and confidential internal meeting notes into ChatGPT on several occasions in under 20 days; Samsung then restricted staff use of generative AI.
- Indexed shared chats: a “discoverable” share can make a conversation visible in a search engine, along with the data it contains.
- Litigation production: a legal obligation can freeze data that was supposed to be deleted and require its production.
- GDPR non-compliance: sending personal data into a consumer tool can breach your obligations, especially without a legal basis or a framework for transfers outside the EU.
On GDPR, the framework often applies to the models and their use. In its Opinion 28/2024 (adopted December 17, 2024), the European Data Protection Board (EDPB) considers that whether a model trained on personal data is anonymous must be assessed case by case — such a model cannot always be considered anonymous — and sets out the conditions for relying on legitimate interest via a three-step test. The CNIL, for its part, stresses that personal data must be protected in training datasets, in models that may have memorized them, and in prompts. We expand on this in “GDPR and generative AI compliance.”
The protections: settings, enterprise offers, anonymization
Provider-side protections exist but are limited. Privacy settings and the training opt-out must be enabled manually (sometimes pre-set to “On”), temporary chats reduce persistence, and enterprise offerings come with a data processing agreement (DPA). The CNIL recommends that organizations disable the provider's reuse of usage data, sign a DPA specifying access limits, verify the absence of training and the compliance of transfers outside the EU, and even use on-premise solutions for sensitive data. But none of these settings undoes what has already fed a model, nor prevents production in litigation. That's why the most robust measure is upstream, on the content: remove sensitive data before sending. Our guide “How to anonymize your data before using AI” details the method, and “Which AI chatbot is most private?” shows why the setting matters more than the brand.
| Protection | What it covers | Its limit |
|---|---|---|
| Training opt-out | Stops future use of your chats for training | Doesn't remove what already fed a model (forward-looking only) |
| Temporary chats / deletion | Reduces the persistence of your history | Residual retention (abuse monitoring) and legal holds remain possible |
| Enterprise plan + DPA | Excludes training by default, frames access | Limited to negotiated contracts; provider's jurisdiction unchanged |
| Anonymization before sending | Sensitive data never leaves your device — covers every tool | Requires detecting and masking the sensitive parts before pasting |
What to do by profile
Needs vary. An individual should avoid entering medical or financial data or ID documents into a consumer tool. A team or company must prevent leaks of code, contracts, customer data and meeting notes — the Samsung case shows that upstream anonymization also protects trade secrets. Sensitive sectors (healthcare, finance, legal) are subject to heightened obligations (GDPR, professional secrecy, trade-secret protection) that make anonymization all the more necessary; the CNIL recommends internal policies defining permitted and prohibited uses, data protection impact assessments (DPIAs), and the appointment of a DPO.
- 1Individual: never paste a social security number, bank details or medical results; opt out of training; for everything else, anonymize before sending.
- 2Team / company: forbid pasting source code, contracts and customer data in the clear; frame the permitted uses; anonymize what still has to be processed by an AI.
- 3Sensitive sector (healthcare / finance / legal): combine an enterprise plan + DPA, DPIA and DPO, and systematic anonymization — it's the only measure that holds whatever the tool.
That's exactly what ONYRI Sanitize is for: since you can't control what the provider does with your data, the lever that remains is to never hand it the sensitive parts. The engine detects sensitive data — from a name to an API key — and replaces it with reversible tokens; detection and the token↔value mapping stay in your browser, and only anonymized text reaches the tool. Whether the conversation is trained on, kept for five years, reviewed by a human or produced in litigation, it only contains tokens — not your real information.
Frequently asked questions
- How do you protect your privacy with AI?
- Start from the principle that you don't control what a provider does with your data, but you control what you send it. Opt out of training and use temporary chats, but above all never paste sensitive data in the clear: minimize and anonymize information before sending. The CNIL explicitly recommends not sharing confidential information in a consumer AI service.
- Are AIs like ChatGPT, Claude or Gemini safe for confidential data?
- Not by default on consumer accounts: they often train on your chats unless you opt out, keep them for a long time (up to 5 years at Anthropic for anyone who doesn't opt out) and can have them read by humans. Enterprise plans with a DPA improve things, but for truly confidential data, the safest measure remains anonymizing it before sending.
- Is a training opt-out enough to protect my data?
- No. The opt-out stops future use of your conversations for training, but it doesn't remove what already fed a model, and it prevents neither retention tied to abuse monitoring, nor occasional human review, nor a legal hold on the data. It reduces exposure; only anonymizing the content before sending is independent of the tool and its settings.
Sources & references
- Q&A on the use of a generative AI system (don't enter confidential data, verify the provider's reuse, DPA, transfers outside the EU) — CNIL
- Anthropic users face a new choice — opt out or share your data for AI training (retention extended to 5 years, September 28, 2025 deadline, affected plans) — TechCrunch
- OpenAI does away with feature that made ChatGPT conversations discoverable by Google (shared chats indexed, feature removed) — Fortune
Keep your sensitive data in your browser
ONYRI Sanitize detects and masks your sensitive data before it reaches the AI, then restores the answer — from names to API keys.
Anonymize my prompt