# robots.txt for abqjournal.com # Allow major search engines User-agent: Googlebot Allow: / User-agent: Googlebot-News Allow: / User-agent: Bingbot Allow: / User-agent: Slurp Allow: / User-agent: DuckDuckBot Allow: / User-agent: Baiduspider Allow: / User-agent: YandexBot Allow: / User-agent: facebookexternalhit Allow: / User-agent: Meta-ExternalFetcher Allow: / User-agent: Facebot Allow: / User-agent: Twitterbot Allow: / # --- Block AI/LLM scrapers --- # OpenAI User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: OAI-SearchBot Disallow: / # Anthropic User-agent: ClaudeBot Disallow: / User-agent: Claude-Web Disallow: / User-agent: anthropic-ai Disallow: / # Google AI User-agent: Google-Extended Disallow: / # Meta AI User-agent: FacebookBot Disallow: / # Common Crawl (used to train many LLMs) User-agent: CCBot Disallow: / # Cohere User-agent: cohere-ai Disallow: / # Apple User-agent: Applebot-Extended Disallow: / # Amazon User-agent: Amazonbot Disallow: / # Bytedance / TikTok User-agent: Bytespider Disallow: / # Diffbot User-agent: Diffbot Disallow: / # ImagesiftBot User-agent: ImagesiftBot Disallow: / # PerplexityBot User-agent: PerplexityBot Disallow: / # YouBot User-agent: YouBot Disallow: / # Omgili User-agent: omgili Disallow: / # Catch-all for everything else not explicitly allowed User-agent: * Disallow: / # Sitemaps Sitemap: https://www.abqjournal.com/sitemapindex.xml Sitemap: https://www.abqjournal.com/news?lab_viewport=newssitemap