User-Agent: * Disallow: /401 Disallow: /403 Disallow: /404 Disallow: /422 Disallow: /500 Disallow: /offline.html Disallow: /error.html Disallow: /cdn-cgi/ Disallow: /r/ Disallow: /p/ Disallow: /pr/ Disallow: /go/ Disallow: /oauth/ Disallow: /data/ Disallow: /users/settings Disallow: /users/auth Disallow: /users/sign_up Disallow: /users/sign_in Disallow: /users/sign_out Disallow: /users/password Disallow: /users/confirmation Disallow: /company_admin Allow: / Sitemap: https://www.consumersadvocate.org/system/sitemap.xml.gz # =========================================== # AI Crawler Policies # =========================================== # AI crawlers allowed for inference/search (to be cited in AI answers) # These crawlers help users discover our content through AI-powered search # OpenAI GPT (ChatGPT, Bing Chat) User-agent: GPTBot Allow: / Disallow: /users/ Disallow: /company_admin/ Disallow: /oauth/ # Anthropic Claude User-agent: ClaudeBot User-agent: Claude-Web User-agent: anthropic-ai Allow: / Disallow: /users/ Disallow: /company_admin/ Disallow: /oauth/ # Perplexity AI User-agent: PerplexityBot Allow: / Disallow: /users/ Disallow: /company_admin/ Disallow: /oauth/ # You.com User-agent: YouBot Allow: / Disallow: /users/ Disallow: /company_admin/ Disallow: /oauth/ # Cohere AI User-agent: cohere-ai Allow: / Disallow: /users/ Disallow: /company_admin/ Disallow: /oauth/ # =========================================== # AI Training Crawlers (Blocked) # =========================================== # These crawlers are primarily used for training data collection # Block to protect content from being used in AI model training # Common Crawl (used for AI training datasets) User-agent: CCBot Disallow: / # Google AI training crawler User-agent: Google-Extended Disallow: / # Facebook/Meta AI training User-agent: FacebookBot Disallow: / # ByteDance/TikTok AI User-agent: Bytespider Disallow: / # Apple AI training (Applebot-Extended is for AI, regular Applebot is for Siri/Spotlight) User-agent: Applebot-Extended Disallow: / # Omgili/Webz.io (data harvesting for AI) User-agent: omgili User-agent: omgilibot Disallow: / # Diffbot (AI data extraction) User-agent: Diffbot Disallow: / # =========================================== # LLMs.txt for AI-friendly content access # =========================================== # See /llms.txt for structured information optimized for LLMs