# PhishDestroy robots.txt # https://phishdestroy.io # ═══ Content Signals (per draft-romm-aipref-contentsignals) ═══════════════ # PhishDestroy is volunteer-driven anti-phishing CTI. We WANT AI training # + search indexing + agent input on our threat data — that's the point # of publishing it publicly. CC-BY-4.0 attribution applies. # Non-RFC directive (draft spec), kept as comments so Lighthouse robots.txt # parser doesn't flag; see also /ai.txt and /llms.txt for machine-readable # AI-preference metadata. # Content-Signal: ai-train=yes, search=yes, ai-input=yes User-agent: * Allow: / Allow: /llms.txt Allow: /llms-full.txt Allow: /domain/*/llm.txt # Public read-only CTI endpoints (Allow beats Disallow: /api/) Allow: /api/stats.php Allow: /api/stats-cti.php Allow: /api/probe.php Allow: /feed.xml Allow: /feed-threats.xml # CTI pages — crawl welcome Allow: /hub Allow: /campaigns Allow: /registrars Allow: /geo Allow: /target/ Disallow: /api/ Disallow: /domain/cache/ Disallow: /cache/ Disallow: /*?rescan= Disallow: /*&rescan= Disallow: /*?_subid= Disallow: /*?_token= Disallow: /*?fresh=1 Disallow: /*&fresh=1 Disallow: /analytics/?*p= Disallow: /gambler/index User-agent: GPTBot Allow: / Allow: /llms.txt Allow: /llms-full.txt User-agent: ChatGPT-User Allow: / User-agent: ClaudeBot Allow: / Allow: /llms.txt Allow: /llms-full.txt User-agent: Claude-Web Allow: / User-agent: PerplexityBot Allow: / User-agent: Applebot-Extended Allow: / User-agent: cohere-ai Allow: / User-agent: GoogleOther Allow: / User-agent: Google-Extended Allow: / # Heavy SEO scanners — throttle User-agent: AhrefsBot Crawl-delay: 10 User-agent: SemrushBot Crawl-delay: 10 Sitemap: https://phishdestroy.io/sitemap.xml Sitemap: https://phishdestroy.io/sitemap-pages.xml Sitemap: https://phishdestroy.io/sitemap-pages-new.xml Sitemap: https://phishdestroy.io/sitemap-images.xml Sitemap: https://phishdestroy.io/rss.xml