LLM
Truncate chat to fit budget API
Truncates a message list to fit a token budget using drop-middle, sliding-window (keep latest), or keep-ends strategies, always pinning a leading system message — returns the kept messages in order, the tokens used, and which indices were dropped. Answers 'how do I trim a long conversation to fit the window?', 'which messages should I drop to stay under budget?'.
Price$0.03per request
MethodPOST
Route/v1/llm/truncate-context
StatusLive
MIME typeapplication/json
Rate limit120/minute
Cache0s public
llmcontexttruncatetokenssliding-windowchat-historybudgetagent
API URL
Integration docshttps://x402.hexl.dev/v1/llm/truncate-contextExample request
{
"messages": [
{
"role": "system",
"content": "sys"
},
{
"role": "user",
"content": "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
},
{
"role": "assistant",
"content": "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
},
{
"role": "user",
"content": "latest question"
}
],
"budget": 40,
"strategy": "sliding-window"
}Example response
{
"strategy": "sliding-window",
"budget": 40,
"keptTokens": 34,
"droppedCount": 0,
"kept": [
{
"index": 0,
"role": "system",
"content": "sys",
"tokens": 5
},
{
"index": 1,
"role": "user",
"content": "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
"tokens": 9
},
{
"index": 2,
"role": "assistant",
"content": "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb",
"tokens": 14
},
{
"index": 3,
"role": "user",
"content": "latest question",
"tokens": 6
}
],
"dropped": []
}Input schema
{
"type": "object",
"required": [
"messages",
"budget"
],
"properties": {
"messages": {
"type": "array",
"items": {
"type": "object"
}
},
"budget": {
"type": "number",
"examples": [
40
]
},
"strategy": {
"type": "string",
"enum": [
"drop-middle",
"sliding-window",
"keep-ends"
],
"default": "sliding-window"
},
"perMessageOverhead": {
"type": "number",
"default": 4
}
}
}Output schema
{
"type": "object",
"additionalProperties": true
}