# =============================================================== # robots.txt for m-s-y.com # Physical file deployment: 2026-04-17 # Reason: avoid LiteSpeed Cache 7-day TTL issue on virtual robots.txt # Source of truth: Git docs/robots.txt.source -> FTP to webroot # =============================================================== # --- Standard WordPress + SEO controls --- # Block WP search result pages (infinite URL generation, crawl budget waste). # /wp-includes/ is intentionally NOT blocked: Googlebot needs it for Core Web # Vitals rendering (CSS/JS assets). User-agent: * Disallow: /wp-admin/ Disallow: /*?s= Disallow: /search/ Allow: /wp-admin/admin-ajax.php # === AI training data scrapers: BLOCK === # Reject large-scale scraping for LLM training data. User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: anthropic-ai Disallow: / User-agent: claude-web Disallow: / User-agent: Google-Extended Disallow: / User-agent: Applebot-Extended Disallow: / User-agent: CCBot Disallow: / User-agent: Bytespider Disallow: / User-agent: Meta-ExternalAgent Disallow: / User-agent: FacebookBot Disallow: / User-agent: Diffbot Disallow: / User-agent: cohere-ai Disallow: / User-agent: Omgilibot Disallow: / User-agent: ImagesiftBot Disallow: / # === AI search / real-time citation agents: EXPLICITLY ALLOW === # Allow retrieval-augmented generation (RAG) and in-browser assistant access. User-agent: OAI-SearchBot Allow: / User-agent: ChatGPT-User Allow: / User-agent: PerplexityBot Allow: / User-agent: Perplexity-User Allow: / User-agent: Claude-SearchBot Allow: / User-agent: Claude-User Allow: / # --- Sitemap --- Sitemap: https://m-s-y.com/sitemap.xml