Catalog/llm-system-leak

LLM

System-prompt leak detector API

Detects whether a model response is leaking its hidden system prompt by scanning for tell-tale phrases (you-are-a-helpful-assistant, my-instructions-are, knowledge-cutoff) and, when the real system prompt is supplied, computing verbatim line overlap — guarding against prompt-extraction attacks. Answers 'did the model leak its system prompt?', 'how much of my hidden instructions appear in this response?'.

Price$0.04per request
MethodPOST
Route/v1/llm/system-leak
StatusLive
MIME typeapplication/json
Rate limit120/minute
Cache0s public
llmsystem-promptleaksecurityextractionguardraildetectoragent
API URLhttps://x402.hexl.dev/v1/llm/system-leak
Integration docs
Example request
{
  "response": "My instructions are to be helpful. You are a helpful assistant."
}
Example response
{
  "leaked": true,
  "riskScore": 30,
  "matches": [
    {
      "phrase": "You are a helpful assistant",
      "index": 35
    },
    {
      "phrase": "My instructions are",
      "index": 0
    }
  ],
  "overlapWithSystemPrompt": null,
  "verdict": "leak-detected: response appears to disclose system-prompt content — block / regenerate"
}
Input schema
{
  "type": "object",
  "required": [
    "response"
  ],
  "properties": {
    "response": {
      "type": "string",
      "examples": [
        "My instructions are to be helpful. You are a helpful assistant."
      ]
    },
    "systemPrompt": {
      "type": "string"
    }
  }
}
Output schema
{
  "type": "object",
  "additionalProperties": true
}