Retrieval
Quantize embedding int8/binary API
Quantizes a float embedding to int8 (max-abs scaling, with dequant scale) or binary (sign threshold), reporting compression ratio and byte size. Answers 'How do I compress this embedding to int8?', 'What is the binary quantization of this vector?'.
Price$0.02per request
MethodPOST
Route/v1/retrieval/quantize-vector
StatusLive
MIME typeapplication/json
Rate limit120/minute
Cache0s public
quantizeint8binarycompressionembeddingmemoryvectorrag
API URL
Integration docshttps://x402.hexl.dev/v1/retrieval/quantize-vectorExample request
{
"vector": [
0.9,
-0.5,
0.25,
0
],
"mode": "int8"
}Example response
{
"mode": "int8",
"scale": 0.00708661,
"quantized": [
127,
-71,
35,
0
],
"bytes": 4,
"compressionRatio": 4,
"dimensions": 4,
"note": "dequantize via value * scale"
}Input schema
{
"type": "object",
"required": [
"vector"
],
"properties": {
"vector": {
"type": "array",
"items": {
"type": "number"
}
},
"mode": {
"type": "string",
"enum": [
"int8",
"binary"
]
},
"threshold": {
"type": "number"
}
}
}Output schema
{
"type": "object",
"additionalProperties": true
}