What is sensitive data? A practical taxonomy
Sensitive data: an operational definition and a 6-family taxonomy, from names to API keys. The reference for knowing what to protect before using AI.
Sensitive data is any information whose disclosure creates risk — for a person (privacy) or for the company (legal, financial, competitive). In practice it goes well beyond names and emails: it can be sorted into six families, from personal identities to technical secrets like API keys. Knowing these families means knowing what to protect before sending text to an AI assistant.
1. Personal data (PERSONAL)
Anything that identifies a person, directly or indirectly: full name, email, phone, postal address, national ID, identity document, medical information. It is the most legally regulated family (GDPR) and the first to protect.
2. Financial data (FINANCIAL)
IBANs, bank details, BIC/SWIFT, card numbers, tax identifiers, salary amounts. Leaking them exposes you to fraud and reveals confidential information about people and the company alike.
3. Technical secrets (TECHNICAL)
API keys and access tokens (cloud, payment platforms, AI providers), SSH private keys, JWT tokens. Pasting a snippet of code or a config file into an AI often leaves a secret behind that opens your systems. It is the most dangerous blind spot.
4. Login credentials (CREDENTIAL)
Clear-text passwords, one-time codes (OTP), multi-factor (MFA) secrets. They have no reason to travel to a third-party service.
5. Corporate data (CORPORATE)
Registration numbers, legal entity names, entity identifiers. Harmless in isolation, they map your business relationships once cross-referenced.
6. Strategic data (STRATEGIC)
Client and project names, internal URLs, meeting markers, roadmap items. Not “personal” under the GDPR, but their leak carries a real competitive cost.
How to use this taxonomy
- Audit your usage: which families do your teams handle in their prompts?
- Enable the matching detectors instead of handling everything case by case.
- Add custom rules for your project names and internal codes.
- Adapt to the country: national identifiers (social security, tax) differ by jurisdiction.
ONYRI Sanitize organizes its detection around exactly these six families, with country-aware rules (FR, US…) and the ability to add your own. It is the concrete translation of “your sensitive data never leaves the browser, from names to API keys.”
Frequently asked questions
- Is sensitive data only personal data?
- No. Personal data is just one of six families. Technical secrets (API keys), login credentials and strategic data are equally sensitive, sometimes more so for the company.
- Why are API keys so critical?
- An API key or token gives direct access to your systems. Slipped into a prompt with a code snippet, it can leak unnoticed — hence the value of entropy-based detection.
- Is corporate data covered by the GDPR?
- Not directly if it doesn't identify a person. But its leak carries a competitive cost: that's why a good taxonomy goes beyond the GDPR scope alone.
Sources & references
Keep your sensitive data in your browser
ONYRI Sanitize detects and masks your sensitive data before it reaches the AI, then restores the answer — from names to API keys.
Anonymize my prompt