Catalog/web-robots

Web

robots.txt parser + crawl-permission check API

Fetches and parses a site's robots.txt into per-user-agent allow/disallow groups + crawl-delay, discovers Sitemap directives, and (given a test path) returns a deterministic crawl verdict using correct longest-match precedence. The value-add: it implements the fiddly robots matching rule callers usually get wrong. Answers 'can I crawl this URL', 'what does this site's robots.txt allow', 'find the sitemap from robots.txt', 'is this path disallowed for my bot'.

Price$0.01per request
MethodPOST
Route/v1/web/robots
StatusLive
MIME typeapplication/json
Rate limit60/minute
Cache3600s public
webrobotsrobots-txtcrawlcrawlersitemapscrapingseo
API URLhttps://x402.hexl.dev/v1/web/robots
Integration docs
Example request
{
  "url": "https://en.wikipedia.org",
  "testPath": "/wiki/Special:Random",
  "testUserAgent": "*"
}
Example response
{
  "url": "https://en.wikipedia.org/robots.txt",
  "groupCount": 30,
  "sitemaps": [
    "https://en.wikipedia.org/sitemap.xml"
  ],
  "hasWildcardGroup": true,
  "test": {
    "userAgent": "*",
    "path": "/wiki/Special:Random",
    "allowed": false,
    "matchedRule": "Disallow: /wiki/Special:",
    "reason": "longest-match"
  },
  "fetched": true,
  "groups": []
}
Input schema
{
  "type": "object",
  "required": [
    "url"
  ],
  "properties": {
    "url": {
      "type": "string",
      "examples": [
        "https://example.com"
      ]
    },
    "testPath": {
      "type": "string",
      "examples": [
        "/private/page.html"
      ]
    },
    "testUserAgent": {
      "type": "string",
      "examples": [
        "Googlebot"
      ]
    }
  }
}
Output schema
{
  "type": "object",
  "additionalProperties": true
}