Web
robots.txt parser + crawl-permission check API
Fetches and parses a site's robots.txt into per-user-agent allow/disallow groups + crawl-delay, discovers Sitemap directives, and (given a test path) returns a deterministic crawl verdict using correct longest-match precedence. The value-add: it implements the fiddly robots matching rule callers usually get wrong. Answers 'can I crawl this URL', 'what does this site's robots.txt allow', 'find the sitemap from robots.txt', 'is this path disallowed for my bot'.
Price$0.01per request
MethodPOST
Route/v1/web/robots
StatusLive
MIME typeapplication/json
Rate limit60/minute
Cache3600s public
webrobotsrobots-txtcrawlcrawlersitemapscrapingseo
API URL
Integration docshttps://x402.hexl.dev/v1/web/robotsExample request
{
"url": "https://en.wikipedia.org",
"testPath": "/wiki/Special:Random",
"testUserAgent": "*"
}Example response
{
"url": "https://en.wikipedia.org/robots.txt",
"groupCount": 30,
"sitemaps": [
"https://en.wikipedia.org/sitemap.xml"
],
"hasWildcardGroup": true,
"test": {
"userAgent": "*",
"path": "/wiki/Special:Random",
"allowed": false,
"matchedRule": "Disallow: /wiki/Special:",
"reason": "longest-match"
},
"fetched": true,
"groups": []
}Input schema
{
"type": "object",
"required": [
"url"
],
"properties": {
"url": {
"type": "string",
"examples": [
"https://example.com"
]
},
"testPath": {
"type": "string",
"examples": [
"/private/page.html"
]
},
"testUserAgent": {
"type": "string",
"examples": [
"Googlebot"
]
}
}
}Output schema
{
"type": "object",
"additionalProperties": true
}