LLM
LLM latency budget API
Models end-to-end LLM latency from token counts and throughput: time-to-first-token (optionally including prefill) plus completion-tokens / tokens-per-second, the total time, effective throughput, and whether it fits a deadline budget. Answers 'how long will this generation take?', 'will this LLM call meet my latency budget?'.
Price$0.02per request
MethodPOST
Route/v1/llm/latency-budget
StatusLive
MIME typeapplication/json
Rate limit120/minute
Cache0s public
llmlatencybudgetthroughputttfttokens-per-seconddeadlineagent
API URL
Integration docshttps://x402.hexl.dev/v1/llm/latency-budgetExample request
{
"promptTokens": 1000,
"completionTokens": 500,
"tokensPerSecond": 50,
"ttftMs": 300,
"budgetMs": 12000
}Example response
{
"promptTokens": 1000,
"completionTokens": 500,
"ttftMs": 300,
"generationMs": 10000,
"totalMs": 10300,
"totalSeconds": 10.3,
"effectiveTps": 48.5,
"withinBudget": true
}Input schema
{
"type": "object",
"required": [
"completionTokens",
"tokensPerSecond"
],
"properties": {
"promptTokens": {
"type": "number",
"examples": [
1000
]
},
"completionTokens": {
"type": "number",
"examples": [
500
]
},
"tokensPerSecond": {
"type": "number",
"examples": [
50
]
},
"ttftMs": {
"type": "number",
"default": 0
},
"prefillTokensPerSecond": {
"type": "number"
},
"budgetMs": {
"type": "number"
}
}
}Output schema
{
"type": "object",
"additionalProperties": true
}