Skip to main content
Critical Robots.txt

robots.txt: AI Crawler Access for Shopify

Updated Up to date
#robots-txt#crawlers#gptbot#perplexitybot
Share
This article includes a quiz, test your knowledge at the end! Jump to quiz

Why robots.txt is critical

Your robots.txt file configuration directly impacts your store’s accessibility to AI crawlers. A misconfigured robots.txt can make your product pages harder to retrieve, cite, validate, or recommend.

Why it matters for AI

The robots.txt file is the first thing a crawler visits before exploring your site. If it contains a blocking rule, a respectful crawler stops immediately. The important part is role separation: search bots, user-action fetchers, training bots, and ads-validation bots do not have the same business impact.

AI Crawlers to Know

CrawlerOperatorUsage
GPTBotOpenAIChatGPT training and general AI knowledge
OAI-SearchBotOpenAIChatGPT search and citation (no training data collected)
OAI-AdsBotOpenAIChatGPT ads landing-page validation and relevance
ChatGPT-UserOpenAIChatGPT with real-time browsing
PerplexityBotPerplexityPerplexity Search & Shopping
ClaudeBotAnthropicClaude with browsing (traffic doubled Q3 2025 - Q1 2026, SE Ranking, 2026)
Google-ExtendedGoogleAI usage/training control, not Google Search crawling
AmazonbotAmazonAmazon search & Alexa
BytespiderByteDanceTikTok AI features

Why OAI-SearchBot matters: Unlike GPTBot (used for training), OAI-SearchBot is OpenAI’s retrieval crawler for search and shopping discovery. OAI-AdsBot is different again: it validates ad landing pages and should be treated as paid-media readiness, not organic GEO.

Most Common Mistakes

robots.txt decision tree: AI crawler arrives, checks if a global Disallow blocks everything (site invisible), then if a specific block targets it (partially blocked), otherwise the site is accessible and indexed
Figure 1 - How an AI crawler decides to index your store based on robots.txt
  1. Global disallow : Disallow: / blocks everything for all crawlers
  2. Specific AI bot blocking : Some SEO guides recommend blocking AI crawlers. Counterproductive if you want recommendations.
  3. Confusion : Disallow: / (blocks everything) vs Disallow: /policies/ (blocks only policies)
User-agent: *
Disallow: /admin
Disallow: /cart
Disallow: /orders
Disallow: /checkouts/
Disallow: /account

# OpenAI crawlers (training + search/citation)
User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

# Other AI crawlers
User-agent: PerplexityBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Google-Extended
Allow: /

Sitemap: https://your-store.com/sitemap.xml


Ready to check your store? Run a free GEO audit →

Frequently Asked Questions

Should I block or allow AI crawlers like GPTBot in robots.txt?
Separate the roles. For organic discovery, avoid blocking search/user agents such as OAI-SearchBot, ChatGPT-User, PerplexityBot, Claude-SearchBot, and Claude-User. Training bots such as GPTBot are a separate consent decision, and OAI-AdsBot matters only for paid ChatGPT ads validation.
How do I edit robots.txt on Shopify?
Shopify generates robots.txt automatically, but you can customize it in your theme's robots.txt.liquid file (Online Store → Themes → Edit code → Templates → robots.txt.liquid). Add specific Allow rules for AI crawlers there.
What is the difference between GPTBot and OAI-SearchBot?
GPTBot is OpenAI's training crawler. OAI-SearchBot supports ChatGPT search and shopping discovery. ChatGPT-User fetches pages on behalf of a user, and OAI-AdsBot validates submitted ad landing pages. Do not treat them as one crawler.
Does blocking AI bots improve my site security?
No. Crawlers only view public pages, just like Googlebot. Blocking them does not secure private data. The business impact depends on the role: blocking search/user agents hurts organic discovery, while blocking training or ads-validation bots has a different meaning.
What is the difference between robots.txt Disallow and blocking AI crawlers?
A general 'Disallow: /' blocks all crawlers including Google. A specific bot block only blocks that one agent. The recommended approach is to allow public product and content pages for useful discovery agents while blocking sensitive paths like /admin and /cart.
🧠 Test your knowledge
1 / 5