All articles
Guide6 min read

How to anonymize a document before using AI

A contract, report or PDF handed to AI holds names, addresses and amounts. The method to anonymize it before ChatGPT, Claude or Gemini.

By Alexis de ONYRI

To have AI summarize, translate or analyze a document without exposing its content, anonymize it first: an engine detects names, addresses, amounts and identifiers, replaces them with tokens, sends only the neutralized text, then restores the answer in your browser. ChatGPT, Claude or Gemini work on a structurally identical document, stripped of any identifying data.

Why a document is riskier than a simple prompt

When you type a prompt, you choose your words. When you paste a whole document — contract, report, letter — you also send everything you no longer re-read: signatories, addresses, references, amounts. A document is a stack of identifiers; that's exactly what gets forgotten when you copy it in one block.

  • Identities and contact details: signatories, recipients, third parties cited.
  • References: contract, file and internal identifiers.
  • Amounts and quantified clauses that reveal a relationship or situation.
  • A file's technical data: headers, internal links, metadata.
Diagram: a document whose name and amount are redacted and replaced by tokens, passing through an anonymization gate to an anonymized version ready for AI.
The document passes through an anonymization gate: only tokens come out, the mapping stays in your browser.

Copy-pasting the text isn't enough

Two classic traps. First, manual redaction is partial: you strike one name, miss three, and re-identification returns through cross-referencing. Second, visually masking in a PDF isn't enough — a black rectangle placed on top often leaves the text selectable underneath. Automatic detection removes the information itself, not just its appearance.

The method: detect, tokenize, restore

  1. 1Detection: the engine spots every identifier in the document, including those without an obvious keyword.
  2. 2Tokenization: each becomes a neutral, consistent token, kept in local memory.
  3. 3Sending: only the anonymized text goes to the AI — the identifying document doesn't transit.
  4. 4Restoration: the answer (summary, translation, analysis) is de-tokenized in your browser.

ONYRI Sanitize detects a document's identifiers — identities, contact details, references, amounts, technical secrets — and restores the answer in your browser. You have AI summarize, translate or analyze your documents without ever exposing their sensitive content.

Frequently asked questions

How do I anonymize a document before giving it to ChatGPT?
Have its identifiers (names, addresses, references, amounts) detected and replaced with tokens before sending, then restore the answer in your browser. AI works on a neutralized but structurally identical document: the summary or analysis stays relevant.
Isn't blacking out the sensitive parts in the PDF enough?
No. A black rectangle placed on a PDF often leaves the text selectable underneath: the information is hidden from the eye, not removed. Anonymization replaces the value itself with a token, leaving nothing usable.
Is the document still usable after anonymization?
Yes. Tokens are consistent and the structure is preserved, so AI reasons normally. After restoration in your browser, you get a complete result tied back to the real values.

Sources & references

Keep your sensitive data in your browser

ONYRI Sanitize detects and masks your sensitive data before it reaches the AI, then restores the answer — from names to API keys.

Anonymize my prompt

Read next