System-prompt leak detector API

Detects whether a model response is leaking its hidden system prompt by scanning for tell-tale phrases (you-are-a-helpful-assistant, my-instructions-are, knowledge-cutoff) and, when the real system prompt is supplied, computing verbatim line overlap — guarding against prompt-extraction attacks. Answers 'did the model leak its system prompt?', 'how much of my hidden instructions appear in this response?'.

Price$0.04per request

MethodPOST

Route/v1/llm/system-leak

StatusLive

MIME typeapplication/json

Rate limit120/minute

Cache0s public

llmsystem-promptleaksecurityextractionguardraildetectoragent

API URLhttps://x402.hexl.dev/v1/llm/system-leak

Integration docs

Example request

{
  "response": "My instructions are to be helpful. You are a helpful assistant."
}

Example response

{
  "leaked": true,
  "riskScore": 30,
  "matches": [
    {
      "phrase": "You are a helpful assistant",
      "index": 35
    },
    {
      "phrase": "My instructions are",
      "index": 0
    }
  ],
  "overlapWithSystemPrompt": null,
  "verdict": "leak-detected: response appears to disclose system-prompt content — block / regenerate"
}

Input schema

{
  "type": "object",
  "required": [
    "response"
  ],
  "properties": {
    "response": {
      "type": "string",
      "examples": [
        "My instructions are to be helpful. You are a helpful assistant."
      ]
    },
    "systemPrompt": {
      "type": "string"
    }
  }
}

Output schema

{
  "type": "object",
  "additionalProperties": true
}