LLM
Token budget planner API
Estimates per-message tokens (real GPT tokenizer + ChatML framing overhead), reserves room for the completion, and reports whether a conversation fits a given context window, by how much it overflows, and the utilization percentage. Answers 'will these messages fit in my context window?', 'how many tokens of headroom do I have left?'.
Price$0.02per request
MethodPOST
Route/v1/llm/token-budget
StatusLive
MIME typeapplication/json
Rate limit120/minute
Cache0s public
llmtokenscontext-windowbudgettokenizerplannerpromptagent
API URL
Integration docshttps://x402.hexl.dev/v1/llm/token-budgetExample request
{
"messages": [
{
"role": "system",
"content": "You are helpful."
},
{
"role": "user",
"content": "Hello there, how are you doing today?"
}
],
"contextWindow": 1000,
"reserveForCompletion": 200
}Example response
{
"contextWindow": 1000,
"reserveForCompletion": 200,
"availableForPrompt": 800,
"promptTokens": 24,
"totalTokens": 224,
"fits": true,
"overflowTokens": 0,
"utilizationPct": 22.4,
"perMessage": [
{
"index": 0,
"role": "system",
"tokens": 8
},
{
"index": 1,
"role": "user",
"tokens": 13
}
]
}Input schema
{
"type": "object",
"required": [
"messages",
"contextWindow"
],
"properties": {
"messages": {
"type": "array",
"items": {
"type": "object"
}
},
"contextWindow": {
"type": "number",
"examples": [
1000
]
},
"reserveForCompletion": {
"type": "number",
"default": 0
},
"perMessageOverhead": {
"type": "number",
"default": 4
}
}
}Output schema
{
"type": "object",
"additionalProperties": true
}