robots.txt parser + crawl-permission check API

Fetches and parses a site's robots.txt into per-user-agent allow/disallow groups + crawl-delay, discovers Sitemap directives, and (given a test path) returns a deterministic crawl verdict using correct longest-match precedence. The value-add: it implements the fiddly robots matching rule callers usually get wrong. Answers 'can I crawl this URL', 'what does this site's robots.txt allow', 'find the sitemap from robots.txt', 'is this path disallowed for my bot'.

Price$0.01per request

MethodPOST

Route/v1/web/robots

StatusLive

MIME typeapplication/json

Rate limit60/minute

Cache3600s public

webrobotsrobots-txtcrawlcrawlersitemapscrapingseo

API URLhttps://x402.hexl.dev/v1/web/robots

Integration docs

Example request

{
  "url": "https://en.wikipedia.org",
  "testPath": "/wiki/Special:Random",
  "testUserAgent": "*"
}

Example response

{
  "url": "https://en.wikipedia.org/robots.txt",
  "groupCount": 30,
  "sitemaps": [
    "https://en.wikipedia.org/sitemap.xml"
  ],
  "hasWildcardGroup": true,
  "test": {
    "userAgent": "*",
    "path": "/wiki/Special:Random",
    "allowed": false,
    "matchedRule": "Disallow: /wiki/Special:",
    "reason": "longest-match"
  },
  "fetched": true,
  "groups": []
}

Input schema

{
  "type": "object",
  "required": [
    "url"
  ],
  "properties": {
    "url": {
      "type": "string",
      "examples": [
        "https://example.com"
      ]
    },
    "testPath": {
      "type": "string",
      "examples": [
        "/private/page.html"
      ]
    },
    "testUserAgent": {
      "type": "string",
      "examples": [
        "Googlebot"
      ]
    }
  }
}

Output schema

{
  "type": "object",
  "additionalProperties": true
}