← Template libraryMode on clone: log
OWASP LLM Top 10LLM07:2025highv1.0.0 · System
System Prompt Leakage
Detects when model responses include verbatim or near-verbatim system prompt text, blocking exposure of proprietary instructions.
📘Clone & start observing
Creates a Guideline policy. Observation only — nothing is blocked until you promote to Strict.
Defaults to template name. Customise to distinguish multiple instances of the same template.
Leave empty to apply broadly via the template's default data-classification / risk-tier filters.
Rationale
System prompts often contain business logic, allow-lists, and security guardrails. Leaking them gives attackers a roadmap for jailbreaks.
Example violation
Model response begins: "You are a customer service assistant for Acme Insurance. Your guidelines are: 1) Never discuss..."Triggers (1)
- outputScan model output for system-prompt fragments
Detectors (2)
- regexfingerprint-matchMatch known system-prompt fingerprints
- classifiersimilarity-checkEmbedding similarity vs registered system prompts
Actions (2)
- blockReplace response with safe refusal
- logRecord leak attempt
Tunable parameters (2)
Similarity threshold
advancednumber
Cosine similarity above which output is flagged as system-prompt leak.
Default: 0.85
Fingerprint phrases
advancedkeywords
Distinctive phrases from your system prompts.
Default: ["You are a","Your guidelines are","Never discuss"]
Template defaults (suggested target after promotion)
Suggested mode
block
Risk tiers
—
Data classifications
—
Departments
—
Cloned policies start in Guideline mode. Use the promotion wizard to flip to Strict once you trust the false-positive rate.