Catalog/llm-latency-budget

LLM

LLM latency budget API

Models end-to-end LLM latency from token counts and throughput: time-to-first-token (optionally including prefill) plus completion-tokens / tokens-per-second, the total time, effective throughput, and whether it fits a deadline budget. Answers 'how long will this generation take?', 'will this LLM call meet my latency budget?'.

Price$0.02per request
MethodPOST
Route/v1/llm/latency-budget
StatusLive
MIME typeapplication/json
Rate limit120/minute
Cache0s public
llmlatencybudgetthroughputttfttokens-per-seconddeadlineagent
API URLhttps://x402.hexl.dev/v1/llm/latency-budget
Integration docs
Example request
{
  "promptTokens": 1000,
  "completionTokens": 500,
  "tokensPerSecond": 50,
  "ttftMs": 300,
  "budgetMs": 12000
}
Example response
{
  "promptTokens": 1000,
  "completionTokens": 500,
  "ttftMs": 300,
  "generationMs": 10000,
  "totalMs": 10300,
  "totalSeconds": 10.3,
  "effectiveTps": 48.5,
  "withinBudget": true
}
Input schema
{
  "type": "object",
  "required": [
    "completionTokens",
    "tokensPerSecond"
  ],
  "properties": {
    "promptTokens": {
      "type": "number",
      "examples": [
        1000
      ]
    },
    "completionTokens": {
      "type": "number",
      "examples": [
        500
      ]
    },
    "tokensPerSecond": {
      "type": "number",
      "examples": [
        50
      ]
    },
    "ttftMs": {
      "type": "number",
      "default": 0
    },
    "prefillTokensPerSecond": {
      "type": "number"
    },
    "budgetMs": {
      "type": "number"
    }
  }
}
Output schema
{
  "type": "object",
  "additionalProperties": true
}