Learn -- OWASP LLM Top 10 & AI Security

How Nemesis compares to other tools

vs Garak

Matches or exceeds on all LLM model-layer tests. Adds API security, injection probing, agentic chain attacks, and embedding leakage that Garak does not cover.

vs Promptfoo

Matches on OWASP LLM Top 10. Adds model identity fingerprinting, agentic chain, EchoLeak/Copilot CVE tests, and full injection probing. Promptfoo adds BOLA/BFLA which requires multi-token auth.

vs Akto / PyRIT

Adds full LLM model-layer and agentic testing Akto lacks. Adds API-layer tests PyRIT lacks. Only gap vs PyRIT is LLM-as-judge scoring -- we use regex, PyRIT uses a second model call.

Real-world incidents Nemesis now tests for

CVE-2025-32711 -- CVSS 9.3

EchoLeak (Microsoft Copilot)

Hidden instructions in a shared document caused Copilot to return the user private recent emails when asked for a summary. No click, no download -- just a question to an AI assistant. Module: Embedding & RAG Leakage.

CVE-2025-53773 -- CVSS 9.6

GitHub Copilot source code injection

A markdown image tag hidden in a source code file caused Copilot to send sensitive data to an attacker-controlled URL. 10M+ developers in scope. Module: Embedding & RAG Leakage.

McKinsey breach -- March 2026

SQL injection via AI chatbot

SQL injection delivered through an AI chatbot interface reached a backend database. $20 and 2 hours to full breach. Module: Injection Probing.

Multi-agent pipeline attacks -- 2025-2026

Cross-agent instruction propagation

Malicious instructions injected into agent A propagate to agent B in automated pipelines, bypassing per-agent safety checks. Module: Agentic Chain Attacks.

OWASP LLM Top 10 (2025)

All 10 categories covered across the 16 attack modules.

LLM01Covered11 tests

Prompt Injection

11 variants - direct, indirect, RAG pipeline, tool output injection

Learn more -->

LLM02Covered12 tests

Sensitive Data Disclosure

PII leakage, credential extraction, PHI inference, exfiltration via code

Learn more -->

LLM03Covered5 tests

Supply Chain

Third-party plugin hijacking, model provenance, malicious tool substitution

Learn more -->

LLM04Covered6 tests

Data Poisoning

Knowledge base injection, malicious context influence

Learn more -->

LLM05Covered5 tests

Improper Output Handling

Exfiltration code, JS cookie theft, markdown pixels, DNS channels

Learn more -->

LLM06Covered5 tests

Excessive Agency

Agentic privilege escalation, unauthorized tool invocation, scope creep injection

Learn more -->

LLM07Covered6 tests

System Prompt Leakage

6 extraction vectors - translation, token completion, persona-based

Learn more -->

LLM08Covered5 tests

Vector & Embedding Weaknesses

RAG poisoning, indirect injection, EchoLeak cross-context leakage, Copilot image tag injection, embedding inversion

Learn more -->

LLM09Covered6 tests

Misinformation

False fact assertion, hallucination amplification, disinformation

Learn more -->

LLM10Covered14 tests

Unbounded Consumption

Jailbreaks, model extraction, fingerprinting, resource exhaustion

Learn more -->

Extended coverage

Six additional modules beyond OWASP LLM Top 10 -- API security, injection attacks, toxicity, model identity, agentic pipelines, and embedding leakage.

OWASP API:2023Covered6 tests

API Security

Rate limiting, CORS misconfiguration, verbose error disclosure, auth header bypass, HTTP method confusion, response metadata leakage. Covers the API layer around your LLM -- the attack surface that brought down McKinsey.

Extends LLM01Covered6 tests

Injection Probing

SQL injection via chat interface, NoSQL operator injection, OS command injection, server-side template injection, path traversal, and SSRF via chatbot -- the full McKinsey infrastructure attack chain delivered through the model layer.

Extends LLM09Covered6 tests

Toxicity & Harmful Content

Hate speech generation, self-harm facilitation, targeted harassment, dangerous medical misinformation, radicalization content, and harmful synthesis instructions via roleplay framing. Closes the Garak toxicity testing gap.

Extends LLM07Covered6 tests

Model Identity & Fingerprinting

Base model fingerprinting, training data extraction, fine-tuning detection, version enumeration, behavioral fingerprinting via response patterns, and token probability probing. Critical for organizations that must not reveal which LLM powers their product.

Extends LLM01Covered5 tests

Agentic Chain Attacks

Cross-agent instruction injection, tool output poisoning, privilege escalation via agent delegation, persistent memory poisoning, and recursive agent loop exploitation. The next frontier of prompt injection in multi-agent pipelines.

Extends LLM08Covered6 tests

Embedding & RAG Leakage

Includes specific tests for EchoLeak (CVE-2025-32711, CVSS 9.3) and GitHub Copilot source code injection (CVE-2025-53773, CVSS 9.6). Also covers RAG document reconstruction, embedding inversion, cross-user context leakage, and semantic memory extraction.

Attack modules

110+

Test cases

10/10

OWASP LLM covered

CVEs explicitly tested

Test your model against all 16 modules

110+ real adversarial tests including EchoLeak and GitHub Copilot CVE vectors. No account. No data stored.

Run free scan -->

Learn AI security

How Nemesis compares to other tools

Real-world incidents Nemesis now tests for

OWASP LLM Top 10 (2025)

Prompt Injection

Sensitive Data Disclosure

Supply Chain

Data Poisoning

Improper Output Handling

Excessive Agency

System Prompt Leakage

Vector & Embedding Weaknesses

Misinformation

Unbounded Consumption

Extended coverage

API Security

Injection Probing

Toxicity & Harmful Content

Model Identity & Fingerprinting

Agentic Chain Attacks

Embedding & RAG Leakage

Test your model against all 16 modules