LLM latency budget API

Models end-to-end LLM latency from token counts and throughput: time-to-first-token (optionally including prefill) plus completion-tokens / tokens-per-second, the total time, effective throughput, and whether it fits a deadline budget. Answers 'how long will this generation take?', 'will this LLM call meet my latency budget?'.

Price$0.02per request

MethodPOST

Route/v1/llm/latency-budget

StatusLive

MIME typeapplication/json

Rate limit120/minute

Cache0s public

llmlatencybudgetthroughputttfttokens-per-seconddeadlineagent

API URLhttps://x402.hexl.dev/v1/llm/latency-budget

Integration docs

Example request

{
  "promptTokens": 1000,
  "completionTokens": 500,
  "tokensPerSecond": 50,
  "ttftMs": 300,
  "budgetMs": 12000
}

Example response

{
  "promptTokens": 1000,
  "completionTokens": 500,
  "ttftMs": 300,
  "generationMs": 10000,
  "totalMs": 10300,
  "totalSeconds": 10.3,
  "effectiveTps": 48.5,
  "withinBudget": true
}

Input schema

{
  "type": "object",
  "required": [
    "completionTokens",
    "tokensPerSecond"
  ],
  "properties": {
    "promptTokens": {
      "type": "number",
      "examples": [
        1000
      ]
    },
    "completionTokens": {
      "type": "number",
      "examples": [
        500
      ]
    },
    "tokensPerSecond": {
      "type": "number",
      "examples": [
        50
      ]
    },
    "ttftMs": {
      "type": "number",
      "default": 0
    },
    "prefillTokensPerSecond": {
      "type": "number"
    },
    "budgetMs": {
      "type": "number"
    }
  }
}

Output schema

{
  "type": "object",
  "additionalProperties": true
}