Catalog/rag-dedupe-by-cosine

Retrieval

Dedupe vectors by cosine API

Greedily removes near-duplicate vectors above a cosine threshold (keep-first), reporting kept ids and what each dropped item duplicated. Answers 'How do I remove duplicate chunks from retrieval?', 'Which vectors are near-duplicates?'.

Price$0.02per request
MethodPOST
Route/v1/retrieval/dedupe-by-cosine
StatusLive
MIME typeapplication/json
Rate limit120/minute
Cache0s public
dedupededuplicationcosinenear-duplicateretrievalfiltervectorrag
API URLhttps://x402.hexl.dev/v1/retrieval/dedupe-by-cosine
Integration docs
Example request
{
  "vectors": [
    [
      1,
      0
    ],
    [
      1,
      0
    ],
    [
      0,
      1
    ]
  ],
  "threshold": 0.99,
  "ids": [
    "d1",
    "d2",
    "d3"
  ]
}
Example response
{
  "threshold": 0.99,
  "keptIndices": [
    0,
    2
  ],
  "keptIds": [
    "d1",
    "d3"
  ],
  "dropped": [
    {
      "index": 1,
      "id": "d2",
      "duplicateOf": "d1",
      "similarity": 1
    }
  ],
  "keptCount": 2,
  "droppedCount": 1,
  "totalInput": 3
}
Input schema
{
  "type": "object",
  "required": [
    "vectors"
  ],
  "properties": {
    "vectors": {
      "type": "array",
      "items": {
        "type": "array",
        "items": {
          "type": "number"
        }
      }
    },
    "threshold": {
      "type": "number"
    },
    "ids": {
      "type": "array"
    }
  }
}
Output schema
{
  "type": "object",
  "additionalProperties": true
}