Catalog/llm-token-budget

LLM

Token budget planner API

Estimates per-message tokens (real GPT tokenizer + ChatML framing overhead), reserves room for the completion, and reports whether a conversation fits a given context window, by how much it overflows, and the utilization percentage. Answers 'will these messages fit in my context window?', 'how many tokens of headroom do I have left?'.

Price$0.02per request
MethodPOST
Route/v1/llm/token-budget
StatusLive
MIME typeapplication/json
Rate limit120/minute
Cache0s public
llmtokenscontext-windowbudgettokenizerplannerpromptagent
API URLhttps://x402.hexl.dev/v1/llm/token-budget
Integration docs
Example request
{
  "messages": [
    {
      "role": "system",
      "content": "You are helpful."
    },
    {
      "role": "user",
      "content": "Hello there, how are you doing today?"
    }
  ],
  "contextWindow": 1000,
  "reserveForCompletion": 200
}
Example response
{
  "contextWindow": 1000,
  "reserveForCompletion": 200,
  "availableForPrompt": 800,
  "promptTokens": 24,
  "totalTokens": 224,
  "fits": true,
  "overflowTokens": 0,
  "utilizationPct": 22.4,
  "perMessage": [
    {
      "index": 0,
      "role": "system",
      "tokens": 8
    },
    {
      "index": 1,
      "role": "user",
      "tokens": 13
    }
  ]
}
Input schema
{
  "type": "object",
  "required": [
    "messages",
    "contextWindow"
  ],
  "properties": {
    "messages": {
      "type": "array",
      "items": {
        "type": "object"
      }
    },
    "contextWindow": {
      "type": "number",
      "examples": [
        1000
      ]
    },
    "reserveForCompletion": {
      "type": "number",
      "default": 0
    },
    "perMessageOverhead": {
      "type": "number",
      "default": 4
    }
  }
}
Output schema
{
  "type": "object",
  "additionalProperties": true
}