Sensitive data: United States vs France, what changes
SSN, EIN, ZIP in the US; social security number, SIREN, IBAN in France. National identifiers differ by country — why detection must adapt.
Sensitive data doesn't take the same shape in the United States and in France: a US SSN (9 digits) is nothing like a French social security number (15 digits), and an EIN or a ZIP code has no direct equivalent in French formats (SIREN, postal code, IBAN). Practical consequence: an effective detection engine must be country-aware — adapting its rules to the country, or it misses identifiers and multiplies false positives.
Personal identifiers: different formats
- United States: SSN (9 digits), sometimes formatted 123-45-6789.
- France: social security number (15 digits, with a check key).
- Phone, address, date: distinct conventions and formats (MM/DD vs DD/MM).
Corporate and tax identifiers
- United States: EIN (employer ID), bank routing/account numbers.
- France: SIREN/SIRET, IBAN, tax identifier.
- Addresses: ZIP code (5 digits) vs French postal code (5 digits, different logic).
Why detection must be country-aware
Applying French rules to a US text (or vice versa) creates two problems: leaks (an unrecognized SSN isn't masked) and noise (digit strings wrongly taken for identifiers). Adapting detectors to the country — formats, lengths, check keys — improves both recall and precision.
Beyond FR and US
Many organizations operate across several countries. The ideal is to choose the detection country, with solid coverage where it exists and a reasonable fallback elsewhere — while clearly flagging when a country's rules aren't yet optimal.
ONYRI Sanitize offers country-aware detection (full FR and US rules: SSN, EIN, ZIP, US dates… on the US side; social security, SIREN, IBAN… on the FR side), with the ability to add your own business rules.
Frequently asked questions
- Why not apply the same rules everywhere?
- Because national identifiers have different formats, lengths and check keys. Generic rules miss real identifiers and trigger false positives on harmless numbers.
- Are a US SSN and a French social security number the same?
- No: 9 digits in the US, 15 digits (with a check key) in France. Detecting them correctly requires country-specific rules.
- What happens for an uncovered country?
- A good tool applies a reasonable fallback (common families: email, IBAN, card…) while indicating that national-identifier coverage isn't yet optimal for that country.
Sources & references
Keep your sensitive data in your browser
ONYRI Sanitize detects and masks your sensitive data before it reaches the AI, then restores the answer — from names to API keys.
Anonymize my prompt