# Robots.txt for PlainJane.com # OPTION B: AGGRESSIVE SEO + AI CRAWLERS ALLOWED # Philosophy: Maximum crawlability for search engines, SEO tools, and AI # Last Updated: October 13, 2025 # ================================================================= # DEFAULT CRAWLERS - Allow Everything Except Admin/Checkout # ================================================================= User-agent: * Disallow: /admin/ Disallow: /cart Disallow: /orders/ Disallow: /checkouts/ Disallow: /checkout Disallow: /account Disallow: /checkout_preview Disallow: /remote_products Disallow: /apple-app-site-association # ALLOW everything else explicitly Allow: /products/ Allow: /collections/ Allow: /pages/ Allow: /blogs/ Allow: / # Block only session IDs (duplicate content) Disallow: /*?*oseid= # ALLOW sorted, filtered, and paginated pages # (These help with product discovery and long-tail keywords) # Note: Use canonical tags on your site to manage duplicate content Allow: /*?*sort_by= Allow: /*?*filter Allow: /*?*page= # Sitemap Sitemap: https://plainjane.com/sitemap.xml # ================================================================= # GOOGLE CRAWLERS - HIGHEST PRIORITY, NO RESTRICTIONS # ================================================================= User-agent: Googlebot User-agent: Googlebot-Image User-agent: Googlebot-Mobile User-agent: Googlebot-News User-agent: Googlebot-Video Disallow: /admin/ Disallow: /cart Disallow: /checkouts/ Disallow: /account Allow: / # Google Ads Bot User-agent: adsbot-google User-agent: AdsBot-Google-Mobile Disallow: /checkouts/ Disallow: /cart Disallow: /orders/ Allow: / # Google-Extended (Google's AI training crawler) User-agent: Google-Extended Allow: / # ================================================================= # AI CRAWLERS - EXPLICITLY ALLOWED FOR MAXIMUM VISIBILITY # ================================================================= # OpenAI (ChatGPT, ChatGPT Search) User-agent: GPTBot User-agent: ChatGPT-User Crawl-delay: 1 Allow: / # Anthropic (Claude AI) User-agent: anthropic-ai User-agent: Claude-Web Crawl-delay: 1 Allow: / # Common Crawl (Used by many AI companies for training) User-agent: CCBot Crawl-delay: 1 Allow: / # Perplexity AI User-agent: PerplexityBot Crawl-delay: 1 Allow: / # Google Bard/Gemini User-agent: Google-Extended Allow: / # Cohere AI User-agent: cohere-ai Crawl-delay: 1 Allow: / # Facebook/Meta AI User-agent: FacebookBot User-agent: meta-externalagent Crawl-delay: 1 Allow: / # Apple Intelligence User-agent: Applebot User-agent: Applebot-Extended Crawl-delay: 1 Allow: / # Amazon AI User-agent: Amazonbot Crawl-delay: 1 Allow: / # Diffbot (AI-powered web data extraction) User-agent: Diffbot Crawl-delay: 1 Allow: / # Omgilibot (AI search) User-agent: omgilibot Crawl-delay: 1 Allow: / # YouBot (You.com AI search) User-agent: YouBot Crawl-delay: 1 Allow: / # Brave Search (Privacy-focused search engine) User-agent: brave-search-bot User-agent: BraveSearchBot Crawl-delay: 1 Allow: / # Kagi Search (Premium search engine) User-agent: Kagi-Bot Crawl-delay: 1 Allow: / # Mojeek (Independent search engine) User-agent: MojeekBot Crawl-delay: 1 Allow: / # Mistral AI User-agent: mistral-crawler Crawl-delay: 1 Allow: / # Hugging Face AI User-agent: HuggingFaceBot Crawl-delay: 1 Allow: / # Character.AI User-agent: character-ai Crawl-delay: 1 Allow: / # Poe (Quora AI platform) User-agent: poe-crawler Crawl-delay: 1 Allow: / # ClaudeBot (newer Anthropic bot name, in addition to anthropic-ai) User-agent: ClaudeBot Crawl-delay: 1 Allow: / # ================================================================= # MAJOR SEARCH ENGINES # ================================================================= # Microsoft Bing User-agent: bingbot User-agent: msnbot User-agent: BingPreview Disallow: /admin/ Disallow: /cart Disallow: /checkouts/ Disallow: /account Allow: / # Yahoo User-agent: Slurp Disallow: /admin/ Disallow: /cart Disallow: /checkouts/ Allow: / # DuckDuckGo User-agent: DuckDuckBot Disallow: /admin/ Disallow: /cart Disallow: /checkouts/ Allow: / # Baidu (Chinese search engine) User-agent: Baiduspider Disallow: /admin/ Disallow: /cart Disallow: /checkouts/ Allow: / # Yandex (Russian search engine) User-agent: Yandex User-agent: YandexBot Disallow: /admin/ Disallow: /cart Disallow: /checkouts/ Allow: / # ================================================================= # SEO MONITORING TOOLS - ALLOWED FOR SITE AUDITING # ================================================================= # Ahrefs - Allow for backlink monitoring and site audits User-agent: AhrefsBot User-agent: AhrefsSiteAudit Crawl-delay: 1 Disallow: /admin/ Disallow: /cart Disallow: /checkouts/ Allow: / # SEMrush - Allow for competitive analysis User-agent: SemrushBot Crawl-delay: 1 Disallow: /admin/ Disallow: /cart Disallow: /checkouts/ Allow: / # Moz - Allow for SEO monitoring User-agent: rogerbot User-agent: dotbot Crawl-delay: 1 Allow: / # Screaming Frog SEO Spider User-agent: ScreamingFrogSEOSpider Allow: / # Majestic SEO User-agent: MJ12bot Crawl-delay: 2 Disallow: /admin/ Disallow: /cart Disallow: /checkouts/ Allow: / # ================================================================= # SOCIAL MEDIA CRAWLERS # ================================================================= # Facebook User-agent: facebookexternalhit User-agent: facebookcatalog Allow: / # Twitter/X User-agent: Twitterbot Allow: / # Pinterest User-agent: Pinterest User-agent: Pinterestbot Crawl-delay: 1 Allow: / # LinkedIn User-agent: LinkedInBot Allow: / # Slack User-agent: Slackbot Allow: / # Discord User-agent: Discordbot Allow: / # WhatsApp User-agent: WhatsApp Allow: / # Telegram User-agent: TelegramBot Allow: / # Reddit User-agent: Snoobot Allow: / # ================================================================= # OTHER LEGITIMATE CRAWLERS # ================================================================= # Apache Nutch (used by various search tools) User-agent: Nutch Crawl-delay: 2 Disallow: /admin/ Disallow: /cart Disallow: /checkouts/ Allow: / # Archive.org (Internet Archive) User-agent: ia_archiver Allow: / # ================================================================= # BAD BOTS - HEAVILY RESTRICTED OR BLOCKED # ================================================================= # Known scrapers and spam bots User-agent: proximic User-agent: BLEXBot User-agent: MegaIndex User-agent: dotbot User-agent: BLEXBot User-agent: Scrapy User-agent: python-requests User-agent: wget User-agent: curl Crawl-delay: 30 Disallow: / # Aggressive SEO spiders that don't respect crawl delays User-agent: SemrushBot/7~bl User-agent: AhrefsBot/5.0 Disallow: /