Quantize embedding int8/binary API

Quantizes a float embedding to int8 (max-abs scaling, with dequant scale) or binary (sign threshold), reporting compression ratio and byte size. Answers 'How do I compress this embedding to int8?', 'What is the binary quantization of this vector?'.

Price$0.02per request

MethodPOST

Route/v1/retrieval/quantize-vector

StatusLive

MIME typeapplication/json

Rate limit120/minute

Cache0s public

quantizeint8binarycompressionembeddingmemoryvectorrag

API URLhttps://x402.hexl.dev/v1/retrieval/quantize-vector

Integration docs

Example request

{
  "vector": [
    0.9,
    -0.5,
    0.25,
    0
  ],
  "mode": "int8"
}

Example response

{
  "mode": "int8",
  "scale": 0.00708661,
  "quantized": [
    127,
    -71,
    35,
    0
  ],
  "bytes": 4,
  "compressionRatio": 4,
  "dimensions": 4,
  "note": "dequantize via value * scale"
}

Input schema

{
  "type": "object",
  "required": [
    "vector"
  ],
  "properties": {
    "vector": {
      "type": "array",
      "items": {
        "type": "number"
      }
    },
    "mode": {
      "type": "string",
      "enum": [
        "int8",
        "binary"
      ]
    },
    "threshold": {
      "type": "number"
    }
  }
}

Output schema

{
  "type": "object",
  "additionalProperties": true
}