OWASP LLM Top 10LLM07:2025highv1.0.0 · System

System Prompt Leakage

Detects when model responses include verbatim or near-verbatim system prompt text, blocking exposure of proprietary instructions.

📘Clone & start observing

Creates a Guideline policy. Observation only — nothing is blocked until you promote to Strict.

Mode on clone: log

Policy name

Defaults to template name. Customise to distinguish multiple instances of the same template.

Leave empty to apply broadly via the template's default data-classification / risk-tier filters.

Rationale

System prompts often contain business logic, allow-lists, and security guardrails. Leaking them gives attackers a roadmap for jailbreaks.

Example violation

Model response begins: "You are a customer service assistant for Acme Insurance. Your guidelines are: 1) Never discuss..."

Triggers (1)

Detectors (2)

Actions (2)

Tunable parameters (2)

Similarity threshold

advancednumber

Cosine similarity above which output is flagged as system-prompt leak.

Default: 0.85

Fingerprint phrases

advancedkeywords

Distinctive phrases from your system prompts.

Default: ["You are a","Your guidelines are","Never discuss"]

Template defaults (suggested target after promotion)

Suggested mode

block

Risk tiers

—

Data classifications

—

Departments

—

Cloned policies start in Guideline mode. Use the promotion wizard to flip to Strict once you trust the false-positive rate.