Top 10 Types of Sensitive Data You Should Never Put Into AI
Never paste passwords, banking, health data or national IDs into a public AI. The ten riskiest data types, ranked by risk, plus the fix that covers them.
Some data should never leave your organisation inside a public AI prompt. Here are the ten riskiest kinds, ranked worst to least bad. At the top: passwords and access keys. Then money, health, and official IDs. The rule is simple. If you would redact it in a public document, don't paste it into ChatGPT, Claude or Gemini. The fix fits in one line: anonymize the text before you send it.
The Top 10 at a glance
Here is the ranking, from highest risk to lowest. Rank one does the most immediate damage. Each line also names the ONYRI detector family that covers it.
- 1Passwords, API keys and login credentials. One leaked key can open direct access to your systems. ONYRI catches these through its CREDENTIAL and TECHNICAL families.
- 2Financial data: bank details, card numbers, IBAN. They lead straight to fraud. FINANCIAL family.
- 3Health data. The GDPR treats it as a special, highly protected category. PERSONAL family.
- 4National IDs: social security numbers, the UK NINO. They enable identity theft. PERSONAL family.
- 5Third-party personal data: clients, patients, colleagues. Exposing it breaks their trust and the law. PERSONAL family.
- 6Contracts, legal documents and trade secrets. A revealed secret loses its value for good. CORPORATE and STRATEGIC families.
- 7Source code and internal company data. They describe the technical core of your business. TECHNICAL and STRATEGIC families.
- 8A full name tied to contact details. Together, these pin down a specific person. PERSONAL family.
- 9Login and session data: tokens, cookies, access IDs. They replay your session without a password. CREDENTIAL and TECHNICAL families.
- 10Anything you would redact in a public document. When in doubt, take it out. ONYRI's six families cover the whole list.
| Rank | Item | Why it's risky |
|---|---|---|
| 1 | Passwords, API keys, credentials | Direct access to your systems if the key leaks |
| 2 | Financial data (bank, card, IBAN) | Leads straight to fraud |
| 3 | Health data | Special, highly protected category (GDPR) |
| 4 | National IDs (SSN, NINO) | Enable identity theft |
| 5 | Third-party personal data | Breaks the trust of clients and colleagues |
| 6 | Contracts, legal, trade secrets | A revealed secret loses its value for good |
| 7 | Source code, internal data | Expose the technical core of the business |
| 8 | Full name + contact details | Pin down a specific person |
| 9 | Login / session data | Replay a session without a password |
| 10 | Anything you would redact | When in doubt, take it out |
The top of the list: what hurts most
Rank one is technical secrets. A password, an API key, a credential. Take a concrete case. A developer pastes a snippet to fix it, with the cloud key still inside. That one key can be enough to open the whole infrastructure. In April 2023, Samsung engineers leaked confidential data through ChatGPT. One incident involved code tied to semiconductor manufacturing. Another involved the transcript of an internal meeting, uploaded to generate minutes. The result: Samsung banned generative AI tools for its employees.
Just behind come money and health. An IBAN, a card number, a statement: enough to fuel fraud. Health goes further. The GDPR, like its UK version, places health data in the “special categories” of Article 9. That list also includes ethnic origin, political opinions, beliefs, sex life and biometric data. This information demands stronger protection. Pasting it into a consumer chatbot runs against that requirement.
Official IDs come next. A social security number or a UK NINO opens the door to impersonation. Finally, beware of data that isn't even yours. A client's name, a patient's file, a colleague's record. Exposing it puts the responsibility on you, not on them.
Why a public AI makes it worse
A query sent to a public AI like ChatGPT is visible to the provider. The NCSC makes this clear. Those queries are stored. They will almost certainly be used to develop the service or the model. The provider, or its partners, can read them. They can also fold them into future versions of the model.
Storage creates a second danger. The NCSC warns that queries kept online can be hacked, leaked, or made public by accident. That includes information that identifies the user. And the operator could be bought by a company with a different view of privacy.
By default, ChatGPT trains on conversations from personal Free, Plus and Pro accounts. You can decline this in Settings, Data Controls. But that opt-out is forward-looking only. Data already used in a finished training run cannot be pulled back afterwards. Per the OpenAI Help Center, business and API inputs are not trained on by default.
The risk is legal too. In December 2024, Italy's data protection authority, the Garante, fined OpenAI 15 million euros. The main reason: processing personal data without a clear legal basis to train ChatGPT. The Rome court annulled that fine on 18 March 2026, but on a jurisdiction question between regulators, not on the merits. Another example. In The New York Times v. OpenAI, a federal preservation order from May 2025 forced OpenAI to keep its output logs. That included conversations some users had deleted. In November 2025, the U.S. District Court for the Southern District of New York ordered the production of 20 million de-identified conversation logs.
How to use AI safely: the fix
The good news: the fix is simple and well known. It's called minimisation. You send only what is strictly needed. And you strip identifiers before sending. To frame what counts as sensitive, our guide on what sensitive data actually is breaks down the six families. And our how-to on anonymizing your data before using AI walks through the steps, one by one.
- Never send a password, key or token — even inside a code snippet.
- Strip names, emails and national IDs before you paste the text.
- Don't share a third party's data without permission.
- When in doubt, run the redaction test: would you remove it from a public document?
That's what ONYRI Sanitize is for. Its engine covers this entire list: around 38 detectors across the six families of sensitive data. It replaces each value with a reversible token before sending. Detection and the token↔value mapping stay in your browser. Only anonymized text reaches the model. ChatGPT, Claude or Gemini find only tokens — never your real information.
Frequently asked questions
- What data should you never put in ChatGPT?
- Ten families above all: passwords and API keys, banking data, health data, national IDs, third-party personal data, contracts and trade secrets, source code, a full name tied to contact details, session data, and anything you would redact in a public document. The UK's NCSC advises sending nothing that would cause a problem if the query became public.
- Why is it risky to paste sensitive data into an AI?
- Because the query is visible to the provider, stored, and often used to improve the model. By default, ChatGPT trains on personal accounts unless you opt out, and that opt-out only applies going forward. Stored data can also be hacked, or frozen by a court order, as in The New York Times v. OpenAI.
- How can I use AI without exposing this data?
- Apply minimisation: send only what's needed and strip identifiers before sending. An anonymization engine replaces each sensitive value with a reversible token in the browser. The AI then receives only anonymized text, never the real values.
Sources & references
- ChatGPT and large language models: what's the risk? (don't include sensitive information; queries are visible, stored, trained on, hackable) — National Cyber Security Centre (UK)
- What is special category data? (Article 9 special categories: health, biometrics, origin, opinions, sex life…) — Information Commissioner's Office (ICO)
- Samsung Bans ChatGPT Among Employees After Sensitive Code Leak (semiconductor code + meeting transcript, April 2023) — Forbes
Keep your sensitive data in your browser
ONYRI Sanitize detects and masks your sensitive data before it reaches the AI, then restores the answer — from names to API keys.
Anonymize my prompt