Tools & AI7 min read

Are AI Translation Tools Safe for Confidential Documents?

Yes — confidential documents pasted into a free AI translator have already ended up publicly on Google. Why it happens, and the fix that covers the content.

By Pierre de ONYRI

Not by default: pasting a confidential document into a free consumer AI translator isn't safe. In 2017, employees of the Norwegian oil company Statoil discovered that texts translated through the free service Translate.com were indexed by Google and accessible to anyone running a search — contracts, dismissal letters, emails. The cause: many free tools store and reuse the submitted text, and their terms of service grant a broad license over it. Professional tiers (APIs, paid subscriptions) are generally zero-retention, but the only guarantee is to anonymize the document before translating it.

The day translated contracts showed up in Google

On 3 September 2017, Norway's public broadcaster NRK revealed that texts run through the online service Translate.com were surfacing publicly in Google search results. Statoil employees found sensitive documents there that they had assumed were confidential. The mechanism: Translate.com relied on cloud storage of submitted texts so that volunteer human translators could review them and improve quality — those texts were then indexed by search engines. In response, the Oslo Stock Exchange blocked its employees' access to the service. That's the clear answer to the question: yes, confidential data pasted into a free translator has already ended up publicly indexed.

DateIncidentWhat leakedThe lesson
3 Sep 2017NRK reveals the Translate.com affairTranslated texts indexed and reachable via GoogleA free translator is not a private vault
Sep 2017 (Statoil)Employees find their documents onlineContracts, dismissal letters, doctor/pharma emailsPasted content leaves your perimeter of control
ResponseOslo Stock Exchange blocks access to the serviceOrganizations treat these tools as a security risk
A look back at the Translate.com incident, after Slator's reporting (NRK, the Statoil affair).

Why free tools keep (and reuse) your text

A free consumer translator sends your text to cloud infrastructure you don't control, and its terms of service often grant broad use of it. Google's terms of service — which apply to the free public version of Google Translate — grant Google a worldwide license to host, store, reproduce, modify and create derivative works from submitted content in order to operate and improve its services. DeepL similarly distinguishes its two tiers: the Free version reserves the right to process, for a limited time, submitted texts in order to train and improve its neural networks (see its terms of use and the DeepL Pro Data Security page). Pasting a confidential document into these tools therefore moves it beyond your control.

  • Storage: the text is sent and kept on third-party servers, sometimes to be reviewed by humans.
  • Reuse: depending on the terms, it can be used to train or improve the translation models.
  • Indexing: poorly secured cloud storage can, as in 2017, end up exposed to search engines.
  • Uneven security: the European Commission notes that many free tools offer neither encryption nor robust data protection.

Consumer vs. pro: the difference is real

Not all translators are equal. For the Cloud Translation API (a paid offering), Google officially states that it does not use submitted content to train or improve its translation features, does not retain the text persistently (it's held briefly, just long enough to produce the translation), claims no ownership over it, and neither shares nor makes it public — a policy that concerns the API, not the consumer widget. For its part, DeepL's Pro version doesn't keep texts long-term, deletes texts and translations after the service runs, and doesn't use them to improve quality (see DeepL Help Center, infrastructure and data protection). Good hygiene: favor a platform with a zero-retention policy, ideally encrypted and certified (ISO 27001, SOC 2).

Two-stage diagram: at top, a confidential document (amber) pasted into an online translator surfaces, exposed, in search results; at bottom, the same document anonymized into tokens (cobalt) reaches the translator with nothing usable, confirmed by a checkmark.
After Slator's reporting (the Statoil affair), Google Cloud documentation (Cloud Translation) and the European Commission; pro offerings cited by name (Google Cloud Translation API, DeepL Pro).

The fix: anonymize the document before translating

Since a translator's terms and retention aren't under your control, the only guarantee is about the content: if the document contains no sensitive data in the clear, then cloud storage, accidental indexing or a human reviewer expose nothing usable. The steps are simple:

  1. 1Spot the sensitive items: names, identifiers, amounts, contact details, internal references.
  2. 2Replace them with reversible tokens before sending the text to the translator.
  3. 3Translate the neutralized text — the tool only sees tokens, never the real information.
  4. 4Restore the original values in the translation, in your browser.

That's exactly what ONYRI Sanitize is for: the engine replaces names, identifiers, amounts and contact details with reversible tokens, and only that anonymized text reaches the translator. Detection and the token↔value mapping stay in your browser — they never leave it. Whatever the translator's terms, whether it stores, reuses or indexes the text, it only finds tokens, not your real information.

Frequently asked questions

Are Google Translate or free AI translators safe for confidential documents?
Not by default. The free consumer version sends your text to a third-party cloud, and its terms of service grant a broad license over it; in 2017, documents run through a free translator were even indexed by Google. For confidential documents, anonymize the content before translating, or use a zero-retention pro offering.
What's the difference between a translator's free version and its pro tier?
The pro tier is generally zero-retention: for Google's Cloud Translation API, text is neither kept long-term, used for training, nor shared; DeepL's Pro version deletes texts and translations after the service runs. The free consumer version, by contrast, can keep and reuse text to improve its models.
How can I translate a confidential document without exposing it?
Anonymize it before submitting: replace names, identifiers, amounts and contact details with reversible tokens, translate the neutralized text, then restore the original values in your browser. The translator never sees the real information, whatever its terms of service.

Sources & references

Keep your sensitive data in your browser

ONYRI Sanitize detects and masks your sensitive data before it reaches the AI, then restores the answer — from names to API keys.

Anonymize my prompt

Read next