# True Protein — Robots.txt (SEO + AI Visibility) # Updated: 2026-03-13 v2 # # Philosophy: # - Allow by default. Block only non-public system paths. # - Deduplication (sort/filter URLs) handled by Shopify's canonical # tags, NOT robots.txt. Blocking prevents Google from seeing the # canonical, which is worse than letting it crawl and consolidate. # - AI bots get the same system blocks as everyone else, plus # explicit access to all public content for maximum AI visibility. # Sitemap: https://www.trueprotein.com.au/sitemap.xml # ====================== # Global (all crawlers) # ====================== User-agent: * # System / transactional (auth-walled, no public content) Disallow: /admin Disallow: /account Disallow: /cart Disallow: /carts Disallow: /orders Disallow: /checkouts/ Disallow: /checkout Disallow: /*/checkouts Disallow: /*/orders # Shopify internals Disallow: /a/downloads/-/* Disallow: /apple-app-site-association Disallow: /.well-known/shopify/monorail Disallow: /cdn/wpm/*.js # Search results page (thin/duplicate content) Disallow: /search # Preview / tracking params Disallow: /*?*oseid=* Disallow: /*preview_theme_id* Disallow: /*preview_script_id* # App-generated "remote" product duplicates # Not in sitemap, no canonical tags. Safe to block. # Note: robots.txt only supports * and $ wildcards, not regex. Disallow: /products/*-remote$ Disallow: /*/products/*-remote$ Disallow: /collections/*/products/*-remote$ Disallow: /*/collections/*/products/*-remote$ # Note: sort_by and filter params are deliberately NOT blocked here. # Shopify adds pointing to the clean collection # URL. Blocking in robots.txt would prevent Google from seeing that # canonical tag, potentially causing "indexed, blocked by robots.txt" # entries instead of proper consolidation. # ====================== # Ads crawler # ====================== User-agent: adsbot-google Disallow: /checkouts/ Disallow: /checkout Disallow: /carts Disallow: /orders Disallow: /*/checkouts Disallow: /*/orders Disallow: /products/*-remote$ Disallow: /*/products/*-remote$ Disallow: /collections/*/products/*-remote$ Disallow: /*/collections/*/products/*-remote$ # ====================== # Crawl rate controls # ====================== User-agent: AhrefsBot Crawl-delay: 10 User-agent: AhrefsSiteAudit Crawl-delay: 10 User-agent: MJ12bot Crawl-delay: 10 User-agent: Pinterest Crawl-delay: 1 # Block heavy generic crawler User-agent: Nutch Disallow: / # ====================================================== # AI & LLM Bots (Generative Engine Optimization) # # Goal: Maximum visibility for brand citations in AI. # These bots get the same system Disallows as global # (no point crawling auth-walled pages), but all public # product, collection, and blog content is fully open. # # Two categories are listed together for simplicity: # 1. Search & retrieval bots — affect what AI search # products (ChatGPT, Perplexity, etc.) show users. # 2. Training & grounding tokens — control whether our # content is used for AI model training. These are # NOT separate crawlers; they govern data usage by # the parent crawler (e.g., Google-Extended controls # Gemini's use of content Googlebot already crawled). # ====================================================== # --- Search & retrieval (affect AI search results) --- User-agent: GPTBot User-agent: OAI-SearchBot User-agent: ChatGPT-User User-agent: ClaudeBot User-agent: Claude-User User-agent: Claude-SearchBot User-agent: PerplexityBot User-agent: Amazonbot User-agent: Diffbot # --- Training & grounding tokens (opt-in to AI training) --- User-agent: Google-Extended User-agent: Applebot-Extended User-agent: cohere-ai User-agent: Meta-ExternalAgent User-agent: Bytespider User-agent: CCBot # System paths (same as global — auth-walled, no useful content) Disallow: /admin Disallow: /account Disallow: /cart Disallow: /carts Disallow: /orders Disallow: /checkouts/ Disallow: /checkout Disallow: /*/checkouts Disallow: /*/orders Disallow: /search # App-generated duplicates Disallow: /products/*-remote$ Disallow: /*/products/*-remote$ Disallow: /collections/*/products/*-remote$ Disallow: /*/collections/*/products/*-remote$ # Rate limiting (protects server without reducing visibility) Crawl-delay: 1