How to Opt Out of AI Training Data
Every major AI provider offers some way to prevent your conversations, documents, or published content from being used to train future models. Some are one-toggle settings; some require a formal legal objection. This guide covers all of them, with direct opt-out instructions and the legal basis behind each.
Fast path
- 1. Log in to each AI service and toggle off training/history settings.
- 2. For providers without a setting: send a formal GDPR Art. 21 objection (EU/EEA) or CCPA deletion request (US).
- 3. Block Common Crawl (CCBot) in your own website's robots.txt.
- 4. Choose enterprise tiers when available — most contractually exclude training.
Provider-by-provider opt-out
OpenAI
Free to opt outModels: ChatGPT, GPT-5, DALL-E, Sora
ChatGPT settings: toggle off "Improve the model for everyone". API users: default is no training use. Legal basis: OpenAI relies on legitimate interest (GDPR Art. 6(1)(f)) which can be objected to per Art. 21.
Anthropic
Not used by defaultModels: Claude (all versions)
Anthropic does not train on API inputs or Claude.ai conversations by default. To formally object to any use, email privacy@anthropic.com citing GDPR Art. 21 or applicable state law.
Models: Gemini, Gemini Advanced
Gemini Apps Activity setting at myactivity.google.com — turn off to prevent Gemini from using your conversations for training. Enterprise and Workspace usage is separate.
Meta
Opt-out availableModels: Llama, Meta AI
EU/UK users: object via Instagram/Facebook settings after receiving notification. US users: submit objection at facebook.com/help/contact. Meta trained Llama on scraped public data; the opt-out is narrower than other providers.
Perplexity
Free to opt outModels: Perplexity AI, Pro Search
Settings > Privacy > toggle "AI Data Retention" off. Does not use conversations for model improvement when disabled.
Mistral AI
Opt-out via settingsModels: Le Chat, Codestral, Mistral Large
Le Chat Privacy Settings — disable "Use my data to improve the model." API users: no training by default.
Microsoft
Free to opt outModels: Copilot, Bing Chat
Microsoft Privacy Dashboard: account.microsoft.com/privacy — turn off "Chat history." Enterprise Copilot is covered by Microsoft 365 terms (no training on customer data).
xAI (Grok)
Opt-out availableModels: Grok
X settings > Data sharing > disable "Share your posts for training." Grok API: no training by default.
Cohere
Not used by defaultModels: Command R, Command R+
API inputs not used for training by default. To object to any historical use, email privacy@cohere.com.
Common Crawl (upstream dataset)
Robots.txt respectedModels: Used by most training pipelines
Add CCBot to your robots.txt: "User-agent: CCBot\nDisallow: /". Common Crawl is the most influential web-scrape dataset used to train nearly every major LLM.
Blocking AI training crawlers on your own website
If you publish content on your own site and want to block AI training crawlers, add the following to your robots.txt:
User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: CCBot Disallow: / User-agent: PerplexityBot Disallow: / User-agent: Bytespider Disallow: / User-agent: Meta-ExternalAgent Disallow: / User-agent: anthropic-ai Disallow: / User-agent: Applebot-Extended Disallow: /
Note: Major AI providers publicly commit to honoring robots.txt, but enforcement varies. Some training-data providers ignore robots.txt. The NYT vs OpenAI litigation is shaping the legal landscape around training-data scraping.
Legal basis: your rights
GDPR Article 21
Right to object to processing based on legitimate interest. Most AI providers cite legitimate interest for training — this is the primary objection pathway in EU/EEA.
GDPR Article 17
Right to erasure. Applies to training data if the lawful basis for processing is missing or withdrawn.
CCPA (California)
Right to delete (§ 1798.105) and right to opt out of sale/sharing (§ 1798.120). Applies to AI providers meeting CCPA thresholds.
CA AB 2013 (2026)
California AI Transparency Act requires AI developers to publish summaries of training datasets, enabling targeted opt-out requests.
Colorado CPA profiling opt-out
Right to opt out of profiling that produces legal or similarly significant effects.
Minnesota MCDPA
Right to question automated decisions and demand human review.
After AI opt-out
Complete the privacy pass
Opting out of AI training stops future exposure. Removing data brokers stops current exposure. OfflistMe handles 200+ brokers in one run for $2.
Start broker opt-out for $2 →FAQ
Can I remove my data from a model that is already trained?+
Not directly — once a model is trained, individual data points are embedded across billions of parameters and cannot be "deleted" mechanically. However, you can: (1) object to future use under GDPR Article 21 or CCPA, (2) request deletion of conversation logs that might be used for fine-tuning, (3) prevent future inclusion by opting out of conversation storage, and (4) cite a formal erasure demand that may obligate providers to exclude you from future model versions.
What legal rights do I have against AI training?+
Under GDPR, Article 21 grants the right to object to processing based on legitimate interest, which most AI providers rely on. Under GDPR Article 17, the right to erasure covers training data if lawful basis is absent. Under CCPA, the right to delete and right to opt out of sale both apply. State laws (Colorado CPA, Minnesota MCDPA) grant specific opt-outs from profiling and automated decision-making. California and Illinois have also passed AI-specific transparency laws (AB 2013 and HB 5116 respectively).
What is the difference between "training opt-out" and "chat history opt-out"?+
They are related but distinct. Chat-history opt-out prevents your conversations from being stored at all. Training opt-out prevents stored conversations from being used to improve future models. Some providers offer both (e.g., OpenAI); others offer only one. For complete protection, enable both where available.
Do enterprise AI subscriptions include training opt-out by default?+
Most enterprise tiers (OpenAI Enterprise, ChatGPT Team, Microsoft Copilot for Microsoft 365, Anthropic Claude Enterprise, Google Gemini for Workspace) contractually guarantee no training on customer data. This is a major reason enterprise tiers exist. Verify in the specific DPA (Data Processing Agreement).
Can I opt my copyrighted content out of being training data?+
Partially. Common Crawl and other upstream datasets respect robots.txt — add "User-agent: CCBot" with "Disallow: /" to prevent inclusion in future crawls. But content already in existing datasets remains. Platforms like Substack, Medium, and GitHub are adding bulk opt-out mechanisms. The NYT vs OpenAI case is shaping case law on this.