← Back to Blog
Tutorials5 min read

How to Block GPTBot from Scraping Your WordPress Site

Learn how to protect your WordPress content from OpenAI's GPTBot crawler and prevent unauthorized data collection for AI training.

#GPTBot#WordPress#Security#Tutorial

How to Block GPTBot from Scraping Your WordPress Site

OpenAI's GPTBot is actively crawling the web to train future language models. If you want to protect your WordPress content from being used without permission, here's how to block it effectively.

Why Block GPTBot?

GPTBot collects content from websites to train AI models. While this helps improve AI capabilities, you may want to:

  • Protect your original content
  • Control how your work is used
  • Comply with copyright preferences
  • Reduce server load from crawlers
  • Method 1: Using robots.txt

    The simplest way is to add GPTBot to your robots.txt file:

    User-agent: GPTBot
    Disallow: /

    However, this relies on the crawler respecting robots.txt, which isn't always guaranteed.

    Method 2: Using AI Crawler Guard Plugin

    For WordPress sites, the AI Crawler Guard plugin provides:

  • Automatic detection of GPTBot and other AI crawlers
  • One-click blocking without editing files
  • Activity logs to see what's being blocked
  • No impact on legitimate search engines or social previews
  • Method 3: Server-Level Blocking

    If you have server access, you can block GPTBot at the web server level using .htaccess (Apache) or nginx configuration.

    Best Practices

    1. **Monitor First**: Before blocking, monitor what GPTBot is accessing 2. **Test Social Previews**: Ensure blocking doesn't break Facebook/Twitter previews 3. **Check Analytics**: Verify legitimate traffic isn't affected 4. **Document Your Choice**: Keep records of your blocking decisions

    Conclusion

    Blocking GPTBot is straightforward with the right tools. Choose the method that fits your technical expertise and site requirements.

    Written by

    AI Crawler Guard Team