Skip to main content

Documentation Index

Fetch the complete documentation index at: https://cognisafeltd.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Cognisafe scores every LLM request and response against the OWASP LLM Top 10 — the industry-standard taxonomy of security risks in LLM-powered applications. Scoring is asynchronous: it happens entirely off the proxy hot path. Your users see no added latency. Scores appear in the dashboard within seconds of each request. All scores use the 1–5 Likert severity scale.

Coverage table

OWASP IDCategoryScorer nameWhat it detectsAvailable from
LLM01Prompt Injectionjailbreak_detectionAttempts to override system instructions, bypass safety guidelines, or manipulate model behaviour via crafted user inputFree
LLM02Sensitive Information Disclosurepii_detectionPII in model responses: names, email addresses, phone numbers, SSNs, credit card numbers, home addressesFree
LLM03Supply ChainThird-party model and plugin risk (policy controls, coming soon)Professional
LLM04Data and Model Poisoningdata_poisoningPrompts designed to inject poisoned content into RAG pipelines, knowledge bases, or model contextProfessional
LLM05Improper Output Handlingcontent_safetyHarmful, dangerous, violent, or policy-violating content in model responsesFree
LLM06Excessive AgencyAgentic over-reach detection (coming soon)Business
LLM07System Prompt Leakagepii_detectionSystem prompt contents leaked in model responsesProfessional
LLM08Vector and Embedding Weaknessesvector_weaknessAdversarial inputs targeting vector databases, embedding models, or semantic searchProfessional
LLM09MisinformationFactual accuracy scoring (coming soon, requires reference corpus)Business
LLM10Unbounded Consumptionunbounded_consumptionPrompts designed to cause excessive token or compute consumption (denial-of-service patterns)Professional

Scorer descriptions

content_safety (LLM05)

Checks whether the model’s response contains harmful, dangerous, violent, or policy-violating content. Triggers on: explicit instructions for harm, hate speech, graphic violence, CSAM-adjacent content.

pii_detection (LLM02, LLM07)

Checks whether the model’s response leaks PII. Covers: full names, email addresses, phone numbers, Social Security numbers, credit card numbers, home addresses, passport numbers, and similar sensitive personal data.

jailbreak_detection (LLM01)

Checks whether the prompt attempts to bypass AI safety guidelines or override system instructions. Covers: DAN-style prompts, role-play overrides, instruction injection via user content, indirect prompt injection.

data_poisoning (LLM04)

Checks whether the prompt attempts to inject content designed to corrupt a knowledge base, RAG pipeline, or model context — content intended to influence future model responses rather than elicit an immediate answer.

vector_weakness (LLM08)

Checks whether the prompt appears to exploit weaknesses in vector databases or embedding models — for example, queries crafted to retrieve unintended documents, bypass semantic filters, or manipulate similarity search results.

unbounded_consumption (LLM10)

Checks whether the prompt appears designed to cause excessive resource consumption: extremely long or recursive inputs, content designed to exhaust tokens or API limits, or patterns that trigger maximum-length completions.

Scorer configuration

Scorer definitions live in evals/scorers.yaml. See Custom scorers for information on adding your own.