Don’t Block What You Want: DuckDuckGo and Common Crawl to Provide IP Address API Endpoints

Your security rules are blocking the traffic you actually want. If search engines and LLM crawlers can’t reach your content, they can’t index it, train on it, or show it in traditional or AI search interfaces.

While your team focuses on stopping malicious bots, good crawlers get caught in the crossfire. DuckDuckGo processes 3 billion searches monthly. Common Crawl powers the training data behind major AI models. Block them, and your content becomes invisible to privacy-conscious users and AI-powered search.

DuckDuckGo (13 June 2025) and Common Crawl (22 June 2025) shipped a quiet but important upgrade: their crawler IP ranges are now available as structured data. No more brittle HTML parsing. No more manual updates. Just clean, fast, automatable bot management.

Key Takeaways

1. DuckDuckGo now expose their crawler IP ranges as structured JSON
2. Common Crawl now expose their IP ranges as structured JSON
3. The new approach replaces fragile HTML pages and makes change‑detection trivial (curl, jq, checksum)
4. Safelist the ranges in your WAF or let a bot‑management service (Akamai, Vercel, etc) handle it
5. Blocking these “good bots” is blocking 3 billion DuckDuckGo searches / month + the dataset that fuels many LLMs
6. We built a free search engine IP tracker that monitors every change hourly with years of history → search‑engine‑ip‑tracker.merj.com/status

What Changed

Historically, most crawler IPs were published in HTML pages or buried in documentation. That worked…until it didn’t.

Fragile parsing: DOM structures change without warning
Reverse DNS validation: is too slow to be usable at scale (some still use this method)
Security gaps: Delayed updates mean legitimate traffic gets blocked or relying on stale lists

Google provided the initial IP address structured endpoint schema and others are now adopting the same pattern.

The schema itself is defined as follows (you can also skip to the example below):

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "title": "IP Prefix List",
    "type": "object",
    "properties": {
        "creationTime": {
            "type": "string",
            "format": "date-time"
        },
        "prefixes": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "ipv4Prefix": {
                        "type": "string",
                        "format": "ipv4-cidr"
                    },
                    "ipv6Prefix": {
                        "type": "string",
                        "format": "ipv6-cidr"
                    }
                },
                "additionalProperties": false,
                "oneOf": [
                    {
                        "required": [
                            "ipv4Prefix"
                        ]
                    },
                    {
                        "required": [
                            "ipv6Prefix"
                        ]
                    }
                ]
            }
        }
    },
    "required": [
        "creationTime",
        "prefixes"
    ],
    "additionalProperties": false
}

A real world example using IPv4 and IPv6 looks like this:

{
    "creationTime": "2025-06-19T12:00:00.000000",
    "prefixes": [
        {
            "ipv6Prefix": "2600:1f28:365:80b0::/60"
        },
        {
            "ipv4Prefix": "44.220.181.167/32"
        },
        {
            "ipv4Prefix": "18.97.14.80/29"
        },
        {
            "ipv4Prefix": "18.97.14.88/30"
        },
        {
            "ipv4Prefix": "98.85.178.216/32"
        }
    ]
}

You can track, verify, and act on them with a single curl command.

Bot	Documentation	JSON Endpoint
DuckDuckBot	Help Page	https://duckduckgo.com/duckduckbot.json
DuckAssistBot	Help Page	https://duckduckgo.com/duckassistbot.json
CCBot	Common Crawl FAQ	https://index.commoncrawl.org/ccbot.json

What You Should Do

Safelist these bots. Audit your WAF or firewall blocks to make sure DuckDuckGo and Common Crawl are explicitly allowed
Automate the check. Monitor for changes using sha256sum, or parse it into your bot allowlist pipeline
Check your traffic. Are you getting referrals from DuckDuckGo? If not, it may be blocked upstream
Use bot management where possible. Platforms like Vercel and Akamai can manage this for you automatically

Why I Care

I’m passionate about helping companies standardise formats that can benefit millions of businesses. When I reached out to the CTO of Common Crawl and the CEO of DuckDuckGo about adopting machine-readable standards, they are moving quickly to implement these changes. I want to thank them for their responsiveness and leadership. This is exactly the kind of industry collaboration we need more of.

The Bigger Picture

This is part of a broader shift. As AI-native search grows, so does the importance of structured access to crawlers:

Expect more LLM search engines to follow (xAI, Anthropic, etc)
A shared schema could allow edge platforms to ingest new IPs automatically
Until then, standard JSON endpoints are the baseline we should expect

Conclusion

If a bot can’t reach your site, it can’t index it, it can’t train on it, and it can’t surface it in the increasingly fragmented AI powered search results of the future.

DuckDuckGo and Common Crawl are making it easier to tackle this problem head on and take advantage of the opportunities.

Make sure the bots you want are getting through.