# Content Signals (Cloudflare standard — https://content-signals.org) # Declares Helius's preferred uses of this content: # search = yes : indexing for traditional and AI-powered search # ai-input = yes : citation in AI-generated answers # ai-train = no : use in AI model training corpora Content-Signal: search=yes, ai-input=yes, ai-train=no # Default rule for all crawlers User-agent: * Allow: / # ===================================================================== # Tier 1 — AI search / answer bots (cite sources in AI responses) # ===================================================================== # OpenAI User-agent: OAI-SearchBot Allow: / User-agent: ChatGPT-User Allow: / # Anthropic (Claude) User-agent: Claude-User Allow: / User-agent: Claude-SearchBot Allow: / # Perplexity User-agent: PerplexityBot Allow: / # Google AI (Gemini) User-agent: Google-Extended Allow: / # Apple AI User-agent: Applebot-Extended Allow: / # DuckDuckGo AI User-agent: DuckAssistBot Allow: / # Mistral (Le Chat user browsing) User-agent: MistralAI-User Allow: / # Phind (developer-focused AI search) User-agent: PhindBot Allow: / # You.com User-agent: YouBot Allow: / # ===================================================================== # Tier 2 — Dual-use crawlers (training + answer/search) # Allowed: these bots power the AI tools developers use to reach Helius. # ===================================================================== # OpenAI (training; also powers ChatGPT browsing) User-agent: GPTBot Allow: / # Anthropic (training) User-agent: ClaudeBot Allow: / User-agent: anthropic-ai Allow: / # Meta AI User-agent: FacebookBot Allow: / User-agent: meta-externalagent Allow: / # Cohere User-agent: cohere-ai Allow: / # Amazon (Rufus, Alexa+, Nova) User-agent: Amazonbot Allow: / # ===================================================================== # Tier 3 — Pure-training crawlers (no user-facing citation) # Disallowed: bulk dataset builders that don't attribute or cite sources. # ===================================================================== # Common Crawl (feeds open-source LLM training corpora) User-agent: CCBot Disallow: / # ByteDance (Doubao / ByteDance model training) User-agent: Bytespider Disallow: / # Allen AI (Dolma open-training corpus) User-agent: Ai2Bot Disallow: / User-agent: Ai2Bot-Dolma Disallow: / Sitemap: https://www.helius.dev/sitemap.xml Sitemap: https://www.helius.dev/docs/sitemap.xml Schemamap: https://www.helius.dev/schema-map.xml