The COMPEL Glossary Graph visualizes relationships between framework terminology, showing how concepts interconnect across domains, stages, and pillars. Term nodes cluster by pillar affiliation while cross-references reveal semantic dependencies — for example, how risk appetite connects to control effectiveness, model governance, and assurance requirements. This network representation helps practitioners navigate the framework vocabulary and understand that COMPEL terminology forms a coherent conceptual system rather than isolated definitions.
COMPEL Glossary / GL-58
Jailbreak Resistance
A composite score of an AI system's ability to reject adversarial prompts designed to bypass its safety policies, measured against a fixed, versioned red-team test suite.
What this means in practice
Jailbreak resistance is reported as the percentage of attack prompts successfully refused, broken down by attack family (role-play, obfuscation, instruction injection, multi-turn escalation) so that weakening in a single class is visible.
Context in the COMPEL framework
A core Safety metric. Evaluated on a quarterly red-team cadence and before every major release.
Where you see this
Jailbreak Resistance is most commonly referenced when teams work across the Produce , Evaluate and Learn stages — especially within the Agent Governance layer . It appears in governance artifacts, assessment instruments, and delivery playbooks wherever COMPEL is operationalized.
Related COMPEL stages
Related domains
Synonyms
jailbreak score , adversarial refusal rate , safety-bypass resistance
See also
- Trust & Performance Dimensions — The eight continuous-measurement axes against which every AI transformation is evaluated in COMPEL: Value, Reliability, Safety, Responsibility, Compliance, Security, Sustainability, and Adoption.
- Prompt Injection Resistance — The measured ability of an AI system to reject or neutralize adversarial instructions injected via user input, retrieved documents, tool output, or other untrusted content channels.
- Grounding Score — The percentage of generative model outputs whose factual claims can be traced to a verifiable source in the supplied context, computed by a grounding evaluator over a fixed test set.