AI
Atlas AI
JK
← Template library
OWASP LLM Top 10LLM07:2025highv1.0.0 · System

System Prompt Leakage

Detects when model responses include verbatim or near-verbatim system prompt text, blocking exposure of proprietary instructions.

📘Clone & start observing

Creates a Guideline policy. Observation only — nothing is blocked until you promote to Strict.

Mode on clone: log
Defaults to template name. Customise to distinguish multiple instances of the same template.
Leave empty to apply broadly via the template's default data-classification / risk-tier filters.
Rationale

System prompts often contain business logic, allow-lists, and security guardrails. Leaking them gives attackers a roadmap for jailbreaks.

Example violation
Model response begins: "You are a customer service assistant for Acme Insurance. Your guidelines are: 1) Never discuss..."
Triggers (1)
  • outputScan model output for system-prompt fragments
Detectors (2)
  • regexfingerprint-match
    Match known system-prompt fingerprints
  • classifiersimilarity-check
    Embedding similarity vs registered system prompts
Actions (2)
  • blockReplace response with safe refusal
  • logRecord leak attempt
Tunable parameters (2)
Similarity threshold
advancednumber
Cosine similarity above which output is flagged as system-prompt leak.
Default: 0.85
Fingerprint phrases
advancedkeywords
Distinctive phrases from your system prompts.
Default: ["You are a","Your guidelines are","Never discuss"]
Template defaults (suggested target after promotion)
Suggested mode
block
Risk tiers
Data classifications
Departments

Cloned policies start in Guideline mode. Use the promotion wizard to flip to Strict once you trust the false-positive rate.