Confidentiality7 min read

What 'do not train on my data' actually means in 2026

A side-by-side of what ChatGPT, Claude, Copilot and Gemini retain when you use the consumer vs. enterprise plans — and where the contractual cliffs are.

By The Redline Editors

A vendor tells you, in friendly sans-serif on a clean landing page, that they "don't train on your data." You're a careful lawyer. You want to believe it. The problem is that the sentence is doing a lot of quiet work, and most of it happens in the gap between the marketing page and the contract you didn't read.

Here's what that sentence usually leaves out.

The claim is about a plan, not a product

"We don't train on your data" is almost never true of every version of a tool. It's true of a tier. The consumer plan you'd naturally sign up for — the twenty-dollar-a-month one — is frequently the exact plan where your inputs are retained, used to improve the model, or held for some period you'd have to dig through a policy to find.

Take the two tools most lawyers reach for first. ChatGPT Plus and Claude Pro are consumer subscriptions. They are excellent. They are also, by default, the wrong place to put anything tied to a client, because paying the consumer price does not hand you a data processing agreement and does not automatically opt you out of training. You can sometimes turn retention off in settings. But "I changed a toggle once" is not a position you want to defend to a regulator.

The business and enterprise tiers are a different animal. There, training on your inputs is typically off by default and backed by a contract — a DPA that makes the vendor your processor, accountable to you, rather than an audience for your client's secrets. Same brand. Same logo. Completely different confidentiality posture.

So the question is never "does this tool train on my data?" It's: on which plan, under which contract, and is the protection on by default or do I have to switch it on?

Why this lands on you, not the vendor

You might reasonably think that if a tool leaks your client's information, the tool's maker is on the hook. In a commercial sense, maybe. In a professional-conduct sense, no.

The SRA is direct about it: solicitors remain responsible for client confidentiality even when they use technology, and the duty doesn't transfer to the software company. If confidential information ends up somewhere it shouldn't, that's your problem to answer for.

The American position arrived in writing in July 2024. ABA Formal Opinion 512 — the profession's first formal guidance on generative AI — reads Model Rule 1.6 onto these tools plainly: a lawyer must make reasonable efforts to prevent the disclosure of information relating to a representation, and must get the client's informed consent before putting confidential information into a self-learning AI tool. Not a boilerplate line buried in an engagement letter. Informed consent, meaning the client actually understood the risk.

Both regulators are saying the same thing in two accents: the convenience is yours to use, and the responsibility is yours to carry.

The contractual cliffs

When you compare the consumer and business tiers of the main tools, the differences cluster in three places, and these are the three to check on any vendor:

  • Training. Does the plan use your inputs to improve the model? Consumer tiers often yes-by-default; business tiers usually no-by-default. This is the headline, but it's not the whole story.
  • Retention. Even when a tool doesn't train on your data, it may still store it — for abuse monitoring, for support, for a fixed window. "Not trained on" and "not retained" are different promises. Ask for both.
  • Sub-processors. Your text often passes through the underlying model provider and other vendors in the chain. A tool can honestly say it doesn't train on your data while still transmitting that data to companies you've never evaluated. The current sub-processor list is the document that tells the truth here.

None of this means you can't use these tools. It means the safe path runs through the business tier, the DPA, and a quick read of where your text actually goes.

What a careful lawyer actually does

The habit that keeps you safe is smaller than the anxiety around it. Three moves:

  • First, anonymise before you paste. Strip names, dates, parties, case numbers, and any distinctive fact that could re-identify the matter. Brief the model on the shape of the problem, not the identity of the people in it.
  • Second, use a business tier with a DPA for anything that even brushes client work, and keep the consumer tiers for genuinely generic tasks — marketing copy, learning a concept, drafting a structure with no real facts in it.
  • Third, read the output yourself, every line. You are the lawyer of record. The model can be confidently, fluently wrong, and "the AI said so" has never once been a defence.

Do those three things and the scary version of this technology quietly becomes the useful version — the one that gives you back the hours without putting your licence near the fire.

Disclaimer · Educational content about software and productivity, not legal advice. AI tools and regulatory guidance change frequently — always evaluate any tool against your own firm's obligations and your regulator's current guidance (e.g. the SRA in England & Wales, or your state bar / the ABA in the US) before using it with client data.

Free starter kit

Want the safe-tools shortlist as a PDF?

10 lawyer-safe AI tools, 12 ready-to-use prompts, and a client-confidentiality checklist for the SRA (UK) and ABA Rule 1.6 (US). Free, no spam.

Get the free Starter Kit →

Go deeper: The Lawyer's AI Toolkit (£29) →