All articles
Guide7 min read

Anonymize health data before handing it to AI

Health data is a GDPR special category. How to anonymize a clinical note or a patient letter before ChatGPT, Claude or Gemini.

By Alexis de ONYRI

Health data is a “special category” under the GDPR: its processing is strictly framed, and pasting a clinical note into a consumer assistant means disclosing it to a third party. Before asking AI to rephrase a letter or summarize a file, replace the patient's identity, contact details, social-security number and identifying clinical elements with tokens. The model handles the text; no patient data leaves the browser.

Why health data isn't data like any other

The GDPR prohibits processing health data as a matter of principle, save for framed exceptions (care, explicit consent…). In the United States, the HIPAA “Safe Harbor” standard points the same way: it lists eighteen identifiers to remove for a record to count as de-identified. The shared message: before any external processing, strip what ties the information to a person.

  • Direct identity: name, date of birth, address, social-security number.
  • Care identifiers: file, insurance and admission numbers.
  • Rare clinical elements that, alone, can re-identify (uncommon condition, precise dates).
  • Contact details of the patient and their relatives.
Diagram: a patient file whose identity and clinical lines are masked by tokens, passing through an anonymization gate before reaching the AI.
The file passes through an anonymization gate: only masked text reaches the AI, the mapping stays local.

What to remove before any prompt

  1. 1The patient's identity and that of third parties (relatives, other patients named).
  2. 2All administrative identifiers: social-security, file, insurance numbers.
  3. 3Contact details: address, phone, email.
  4. 4Precise dates and locations that, combined with context, re-identify.

A flow that preserves medical confidentiality

  1. 1Detection: the engine spots identity, care identifiers and identifying elements.
  2. 2Tokenization: each element becomes a neutral token, kept in local memory.
  3. 3Sending: only the anonymized text goes to the AI — health data doesn't transit.
  4. 4Restoration: the answer is de-tokenized in your browser, tied to the right file.

ONYRI Sanitize detects a file's identifying data — identity, social-security number, contact details, medical elements — and restores the answer in your browser. Care teams gain AI's help to rephrase or summarize, without ever exposing a patient or breaking medical confidentiality.

Frequently asked questions

Can I use ChatGPT to draft a medical letter?
Yes, provided no identifying data is sent. Anonymize the patient's identity, identifiers and contact details before sending: AI works on the clinical content, and you restore the answer in the browser. Final validation remains the health professional's responsibility.
Is removing the patient's name enough?
No. Re-identification can come from a combination: date of birth, postal code, rare condition, admission dates. That's the point of an engine that detects all identifiers, including administrative numbers, rather than a partial manual deletion.
Does medical confidentiality apply to AI tools?
Yes. Confidentiality doesn't depend on the channel: handing identifying health data to a third-party assistant is a disclosure. Anonymizing before sending is the most direct measure to stay on the right side.

Sources & references

Keep your sensitive data in your browser

ONYRI Sanitize detects and masks your sensitive data before it reaches the AI, then restores the answer — from names to API keys.

Anonymize my prompt

Read next