RAG context packer API

Greedily packs the highest-scoring retrieval chunks under a token budget (relevance-ordered, accounting for separators), emits the assembled context string, which chunks were included vs dropped, and the budget utilization — the core RAG assembly step. Answers 'which retrieved chunks fit in my context budget?', 'how do I assemble the best context for RAG?'.

Price$0.04per request

MethodPOST

Route/v1/llm/rag-pack

StatusLive

MIME typeapplication/json

Rate limit120/minute

Cache0s public

llmragcontextretrievalpacktokensbudgetagent

API URLhttps://x402.hexl.dev/v1/llm/rag-pack

Integration docs

Example request

{
  "chunks": [
    {
      "id": "c1",
      "text": "alpha beta gamma",
      "score": 0.9
    },
    {
      "id": "c2",
      "text": "delta epsilon",
      "score": 0.3
    },
    {
      "id": "c3",
      "text": "zeta",
      "score": 0.95
    }
  ],
  "budget": 12
}

Example response

{
  "budget": 12,
  "packedTokens": 11,
  "utilizationPct": 91.7,
  "includedCount": 3,
  "excludedCount": 0,
  "packed": [
    {
      "id": "c3",
      "text": "zeta",
      "tokens": 2,
      "score": 0.95
    },
    {
      "id": "c1",
      "text": "alpha beta gamma",
      "tokens": 3,
      "score": 0.9
    },
    {
      "id": "c2",
      "text": "delta epsilon",
      "tokens": 2,
      "score": 0.3
    }
  ],
  "excluded": [],
  "context": "zeta\n\n---\n\nalpha beta gamma\n\n---\n\ndelta epsilon"
}

Input schema

{
  "type": "object",
  "required": [
    "chunks",
    "budget"
  ],
  "properties": {
    "chunks": {
      "type": "array",
      "items": {
        "type": "object"
      }
    },
    "budget": {
      "type": "number",
      "examples": [
        12
      ]
    },
    "separator": {
      "type": "string",
      "default": "\n\n---\n\n"
    },
    "perChunkOverhead": {
      "type": "number",
      "default": 0
    }
  }
}

Output schema

{
  "type": "object",
  "additionalProperties": true
}