GPTBot – OpenAI’s new web crawler

OpenAI has published information about its new web crawler named GPTBot. You can read the documentation on GPTBot over here.

What is GPTBot. GPTBot is OpenAI’s web crawler, used by OpenAI to crawl the web, consume knowledge for its AI features, such as ChatGPT, and use that to provide AI-generaterd answers to your questions.

Useragent. GPTBot’s User agent token is “GPTBot” and its full user-agent string: is “Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)”.

Robots.txt. You can use your robots.txt to block GPTBot from accessing all or parts of your website. To disallow GPTBot to access your site you can add the GPTBot to your site’s robots.txt:

User-agent: GPTBot Disallow: /

To allow GPTBot to access your only parts of your site you can add the GPTBot token to your site’s robots.txt like this:

User-agent: GPTBot Allow: /directory-1/ Disallow: /directory-2/

GPTBot IP ranges. OpenAI also published the IP ranges that GPTBot uses over here, it currently lists one, but I suspect they will add more over time.

Why we care. If you do not want GPTBot crawling your site and/or using your content for its purposes, then you can disallow GPTBot from crawling your site. This is the same protocol you would use to block GoogleBot, BingBot or other web crawlers.

New on Search Engine Land

About the author

Barry Schwartz

Barry Schwartz a Contributing Editor to Search Engine Land and a member of the programming team for SMX events. He owns RustyBrick, a NY based web consulting firm. He also runs Search Engine Roundtable, a popular search blog on very advanced SEM topics. Barry can be followed on Twitter here.

Post Views: 122

Interesting Read

Comment here Cancel reply