Catalog/rag-shingle-jaccard

Retrieval

Shingle Jaccard near-dup API

Computes Jaccard similarity over character or word n-gram shingles of two texts for near-duplicate detection. Answers 'How similar are these two texts by shingling?', 'Are these documents near-duplicates?'.

Price$0.01per request
MethodPOST
Route/v1/retrieval/shingle-jaccard
StatusLive
MIME typeapplication/json
Rate limit120/minute
Cache0s public
shinglejaccardngramnear-duplicatededuptextsimilarityrag
API URLhttps://x402.hexl.dev/v1/retrieval/shingle-jaccard
Integration docs
Example request
{
  "a": "the quick brown fox",
  "b": "the quick red fox",
  "shingleSize": 3,
  "mode": "word"
}
Example response
{
  "jaccardSimilarity": 0,
  "mode": "word",
  "shingleSize": 3,
  "shinglesA": 2,
  "shinglesB": 2,
  "intersection": 0,
  "union": 4,
  "isNearDuplicate": false
}
Input schema
{
  "type": "object",
  "required": [
    "a",
    "b"
  ],
  "properties": {
    "a": {
      "type": "string"
    },
    "b": {
      "type": "string"
    },
    "shingleSize": {
      "type": "integer"
    },
    "mode": {
      "type": "string",
      "enum": [
        "char",
        "word"
      ]
    }
  }
}
Output schema
{
  "type": "object",
  "additionalProperties": true
}