Catalog/rag-quantize-vector

Retrieval

Quantize embedding int8/binary API

Quantizes a float embedding to int8 (max-abs scaling, with dequant scale) or binary (sign threshold), reporting compression ratio and byte size. Answers 'How do I compress this embedding to int8?', 'What is the binary quantization of this vector?'.

Price$0.02per request
MethodPOST
Route/v1/retrieval/quantize-vector
StatusLive
MIME typeapplication/json
Rate limit120/minute
Cache0s public
quantizeint8binarycompressionembeddingmemoryvectorrag
API URLhttps://x402.hexl.dev/v1/retrieval/quantize-vector
Integration docs
Example request
{
  "vector": [
    0.9,
    -0.5,
    0.25,
    0
  ],
  "mode": "int8"
}
Example response
{
  "mode": "int8",
  "scale": 0.00708661,
  "quantized": [
    127,
    -71,
    35,
    0
  ],
  "bytes": 4,
  "compressionRatio": 4,
  "dimensions": 4,
  "note": "dequantize via value * scale"
}
Input schema
{
  "type": "object",
  "required": [
    "vector"
  ],
  "properties": {
    "vector": {
      "type": "array",
      "items": {
        "type": "number"
      }
    },
    "mode": {
      "type": "string",
      "enum": [
        "int8",
        "binary"
      ]
    },
    "threshold": {
      "type": "number"
    }
  }
}
Output schema
{
  "type": "object",
  "additionalProperties": true
}