Your security rules are blocking the traffic you actually want. If search engines and LLM crawlers can’t reach your content, they can’t index it, train on it, or show it in traditional or AI search interfaces.

While your team focuses on stopping malicious bots, good crawlers get caught in the crossfire. DuckDuckGo processes 3 billion searches monthly. Common Crawl powers the training data behind major AI models. Block them, and your content becomes invisible to privacy-conscious users and AI-powered search.

Last Friday (13 June 2025), DuckDuckGo shipped a quiet but important upgrade: their crawler IP ranges are now available as structured data. No more brittle HTML parsing. No more manual updates. Just clean, fast, automatable bot management.

Key Takeaways

1. DuckDuckGo now expose their crawler IP ranges as structured JSON
2. Common Crawl is preparing to expose their IP ranges as structured JSON
3. The new approach replaces fragile HTML pages and makes change‑detection trivial (curl, jq, checksum)
4. Safelist the ranges in your WAF or let a bot‑management service (Akamai, Vercel, etc) handle it
5. Blocking these “good bots” is blocking 3 billion DuckDuckGo searches / month + the dataset that fuels many LLMs
6. We built a free search engine IP tracker that monitors every change hourly with years of history → search‑engine‑ip‑tracker.merj.com/status

What Changed

Historically, most crawler IPs were published in HTML pages or buried in documentation. That worked…until it didn’t.

  • Fragile parsing: DOM structures change without warning
  • Reverse DNS validation: is too slow to be usable at scale (some still use this method)
  • Security gaps: Delayed updates mean legitimate traffic gets blocked or relying on stale lists

Google provided the initial IP address structured endpoint schema and others are now adopting the same pattern.

The schema itself is defined as follows (you can also skip to the example below):

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "title": "IP Prefix List",
    "type": "object",
    "properties": {
        "creationTime": {
            "type": "string",
            "format": "date-time"
        },
        "prefixes": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "ipv4Prefix": {
                        "type": "string",
                        "format": "ipv4-cidr"
                    },
                    "ipv6Prefix": {
                        "type": "string",
                        "format": "ipv6-cidr"
                    }
                },
                "additionalProperties": false,
                "oneOf": [
                    {
                        "required": [
                            "ipv4Prefix"
                        ]
                    },
                    {
                        "required": [
                            "ipv6Prefix"
                        ]
                    }
                ]
            }
        }
    },
    "required": [
        "creationTime",
        "prefixes"
    ],
    "additionalProperties": false
}

A real world example for Common Crawl using IPv4 and IPv6 looks like this:

{
    "creationTime": "2025-06-05T12:00:00.000000",
    "prefixes": [
        {
            "ipv6Prefix": "2600:1f28:365:80b0::/60"
        },
        {
            "ipv4Prefix": "8.97.9.168/29"
        },
        {
            "ipv4Prefix": "18.97.14.80/29"
        },
        {
            "ipv4Prefix": "18.97.14.88/30"
        },
        {
            "ipv4Prefix": "98.85.178.216/32"
        }
    ]
}

You can track, verify, and act on them with a single curl command.

BotDocumentationJSON Endpoint
DuckDuckBothttps://duckduckgo.com/duckduckgo-help-pages/results/duckduckbothttps://duckduckgo.com/duckduckbot.json
DuckAssistBothttps://duckduckgo.com/duckduckgo-help-pages/results/duckassistbothttps://duckduckgo.com/duckassistbot.json
CCBotLinked in the Common Crawl FAQComing soon…

What You Should Do

  1. Safelist these bots. Audit your WAF or firewall blocks to make sure DuckDuckGo and Common Crawl are explicitly allowed
  2. Automate the check. Monitor for changes using sha256sum, or parse it into your bot allowlist pipeline
  3. Check your traffic. Are you getting referrals from DuckDuckGo? If not, it may be blocked upstream
  4. Use bot management where possible. Platforms like Vercel and Akamai can manage this for you automatically

Why I Care

I’m passionate about helping companies standardise formats that can benefit millions of businesses. When I reached out to the CTO of Common Crawl and the CEO of DuckDuckGo about adopting machine-readable standards, they are moving quickly to implement these changes. I want to thank them for their responsiveness and leadership. This is exactly the kind of industry collaboration we need more of.

The Bigger Picture

This is part of a broader shift. As AI-native search grows, so does the importance of structured access to crawlers:

  • Expect more LLM search engines to follow (xAI, Anthropic, etc)
  • A shared schema could allow edge platforms to ingest new IPs automatically
  • Until then, standard JSON endpoints are the baseline we should expect

Conclusion

If a bot can’t reach your site, it can’t index it, it can’t train on it, and it can’t surface it in the increasingly fragmented AI powered search results of the future.

DuckDuckGo and Common Crawl are making it easier to tackle this problem head on and take advantage of the opportunities.

Make sure the bots you want are getting through.