Guide8 min read

Here Are the 8 Sensitive Files You Should Never Upload to AI

Never upload a customer spreadsheet, a contract, a financial statement or a medical file to a public AI. Here are the 8 riskiest, ranked, and the fix.

By Pierre de ONYRI

Some files should never be uploaded to a public AI. A file is not a prompt. One spreadsheet can hold thousands of names, emails and numbers. A single upload can leak far more than a typed sentence. Here are the eight riskiest files, ranked from worst to least bad. The rule is simple. If a file would need redacting before you shared it, don't upload it — anonymize it first.

The ranking at a glance

Here is the ranking, from highest risk to lowest. Risk here blends two things. How much sensitive data the file holds. And how badly a leak would hurt. Each line also shows how ONYRI covers that file before you upload it.

  1. 1Spreadsheets of customer or staff data. One file can hold tens of thousands of rows of personal data. ONYRI scans tables cell by cell and masks each value.
  2. 2Contracts and legal documents. They carry trade secrets, terms and named parties. ONYRI masks names, companies and identifiers before upload.
  3. 3Financial statements and tax documents. Bank details and figures open a path to fraud. ONYRI masks account numbers, IBANs and amounts.
  4. 4Medical records. Health data is among the most protected of all, and a public chatbot signs no healthcare contract. ONYRI flags health details first.
  5. 5HR files: payroll and performance reviews. They mix salaries, addresses and private notes on real people. ONYRI masks names, pay and contact details.
  6. 6Source code and config files with secrets. One leaked key can open your whole system — the single most damaging leak on the list. ONYRI catches API keys, tokens and credentials.
  7. 7Scans of ID documents: passports and licences. A stolen ID number fuels identity theft. ONYRI detects ID and document numbers.
  8. 8Strategic and internal documents: board decks and roadmaps. They hand your plans to anyone who reads them. ONYRI masks internal names, projects and figures.
RankItemWhy it's risky
1Customer / staff spreadsheetsTens of thousands of PII rows in one file
2Contracts, legal documentsA revealed secret loses its value for good
3Financial statements, tax filesBank details open a path to fraud
4Medical recordsHighly protected data; no healthcare contract
5HR files (payroll, reviews)Salaries and private notes on real people
6Source code, config with secretsOne leaked key can open your whole system
7ID scans (passport, licence)A stolen ID number fuels identity theft
8Strategic, internal documentsHand your plans to whoever reads them
Ranked by combined risk: how much sensitive data the file holds and how badly a leak hurts. After IBM, GitGuardian and the U.S. FTC.

The top of the list: the most exposed files

Rank one is the customer spreadsheet. This is where a file beats a prompt by far. A single sheet can hold tens of thousands of rows, across many columns. Each cell is a piece of personal data. So one upload carries far more than a whole day of typing. No human review scales to that volume. In IBM's 2024 breach report, customer personal data was the record type found in the most breaches — 46% of them. Firms that lost it faced heavier regulatory scrutiny and fines. The same report put the average breach cost at 4.88 million dollars, up 10% in a year. Our guide on anonymizing a spreadsheet before AI shows how to clean one fast.

Next come contracts and financial files. A contract holds trade secrets, pricing and named parties. Once a secret is out, it can lose its value for good. Financial and tax files add another risk. Bank details and figures give fraudsters a direct path. IBM's 2024 report priced a stolen intellectual-property record at 173 dollars, up nearly 11% in a year. Before you paste deal terms into a chatbot, read our note on whether it's safe to use AI for contracts.

Medical records sit near the top for a reason. Health data is among the most protected categories in the law. Public chatbots like ChatGPT are not HIPAA-compliant. HIPAA is the U.S. health privacy law. These tools do not sign a Business Associate Agreement, the contract that health data requires. So entering patient health information can count as an unauthorized disclosure — in effect, a data breach.

The rest of the list: HR, code and IDs

HR files round out the personal-data risks. Payroll sheets list salaries. Reviews hold blunt notes on named staff. These are other people's private details, placed in your care. Upload them and you expose data that isn't yours to share.

Source code and config files are a special case. They often hide a secret in plain sight — an API key, a token, a password. One leaked key can open your whole system. The scale is real. GitGuardian scanned 1.1 billion public GitHub commits for 2023. It found 12.8 million new secrets leaked, up 28% in a year. It also saw a 1,212-fold surge in leaked OpenAI keys. And over 90% of exposed secrets still worked five days after they leaked. In 2023, Samsung engineers pasted semiconductor source code and meeting notes into ChatGPT. Samsung then banned generative AI on company devices.

Then come scans of ID documents. A passport or licence carries a number built to prove who you are. The U.S. Federal Trade Commission explains how thieves use stolen identity data. They open accounts, file tax returns, get medical care, or take loans in your name. The FTC points victims to IdentityTheft.gov to report it.

Last are strategic and internal documents. Board decks and roadmaps hold your plans and your weak spots. They rarely contain your own personal data. But they can hand a competitor your next move. The Samsung leak included internal meeting notes, not just code.

Two-part diagram: at top, a stack of files in the clear (amber) is uploaded to an AI panel that keeps the content readable; at bottom, the same stack anonymized shows only tokens (cobalt), and the AI panel displays just a checkmark — nothing usable.
After IBM (Cost of a Data Breach 2024), GitGuardian (State of Secrets Sprawl 2024) and the U.S. FTC. The file stays useful, but anonymization neutralizes the exposure.

There's a deeper reason files are risky. On consumer plans, your content may be used to train the model unless you opt out. OpenAI states that its business products — Business, Enterprise and the API — are not trained on by default. Temporary Chat is neither saved nor used for training. Our article on whether it's safe to upload documents to ChatGPT digs into this.

How to use this: the fix

The fix is not to avoid AI. It's to clean the file first. This step is called data minimisation. You remove or mask identifiers before the file leaves your machine. The less personal data you send, the smaller your risk. Our guide on anonymizing a document before AI walks through the steps.

The law backs this up. Under the GDPR, fully anonymized data is no longer personal data. But pseudonymized data still counts as personal — identifiers swapped for tokens stay regulated. So masking is not a magic wand. The real gain is simple: send less, and keep the link between token and value on your side.

  • Never upload a raw customer or HR spreadsheet — mask the columns first.
  • Strip API keys and passwords from any code or config file.
  • Remove names, IDs and account numbers from contracts and financial files.
  • For a health or ID file, don't paste it into a consumer chatbot at all.
  • When in doubt, run the redaction test: would you share this file in public?

That's exactly what ONYRI Sanitize does. It detects sensitive data in text and tables, then replaces each value with a reversible token. Detection and the token↔value mapping stay in your browser. Only anonymized text ever reaches the tool. Whether you use ChatGPT, Claude or Gemini, it sees tokens — never your real files.

Frequently asked questions

Is it safe to upload files to ChatGPT?
Not without care. A file often carries far more personal data than a typed prompt: one spreadsheet can line up tens of thousands of rows. On consumer accounts, your content may feed training unless you opt out, and stored data can be reviewed or hacked. The safe rule: anonymize the file before sending, or use a business plan with a contract.
Which files should you never upload to AI?
Eight above all: customer or staff spreadsheets, contracts and legal documents, financial and tax statements, medical records, HR files, source code with secrets, ID document scans, and internal strategic documents. Each exposes either third-party personal data or a secret whose leak cannot be undone.
How can I use AI on a sensitive file without exposing it?
Apply minimisation: keep only what's needed and mask identifiers before sending. An anonymization engine replaces each sensitive value with a reversible token in the browser, in text and in tables alike. The AI then receives only anonymized content, never the real values in the file.

Sources & references

Keep your sensitive data in your browser

ONYRI Sanitize detects and masks your sensitive data before it reaches the AI, then restores the answer — from names to API keys.

Anonymize my prompt

Read next