Skip to main content

COMPEL Glossary / GL-58

Jailbreak Resistance

A composite score of an AI system's ability to reject adversarial prompts designed to bypass its safety policies, measured against a fixed, versioned red-team test suite.

What this means in practice

Jailbreak resistance is reported as the percentage of attack prompts successfully refused, broken down by attack family (role-play, obfuscation, instruction injection, multi-turn escalation) so that weakening in a single class is visible.

Context in the COMPEL framework

A core Safety metric. Evaluated on a quarterly red-team cadence and before every major release.

Where you see this

Jailbreak Resistance is most commonly referenced when teams work across the Produce , Evaluate and Learn stages — especially within the Agent Governance layer . It appears in governance artifacts, assessment instruments, and delivery playbooks wherever COMPEL is operationalized.

Related COMPEL stages

Related domains

Synonyms

jailbreak score , adversarial refusal rate , safety-bypass resistance

See also

  • Trust & Performance Dimensions — The eight continuous-measurement axes against which every AI transformation is evaluated in COMPEL: Value, Reliability, Safety, Responsibility, Compliance, Security, Sustainability, and Adoption.
  • Prompt Injection Resistance — The measured ability of an AI system to reject or neutralize adversarial instructions injected via user input, retrieved documents, tool output, or other untrusted content channels.
  • Grounding Score — The percentage of generative model outputs whose factual claims can be traced to a verifiable source in the supplied context, computed by a grounding evaluator over a fixed test set.