Security Research Benchmark

Evaluating Persistent Compromise in Autonomous AI Agents

AgentKillChain is an open, reproducible benchmark for stress-testing AI agents against latent prompt injection, memory poisoning, and toolchain confusion.

Read Paper View GitHub

Baseline Signals

The May 20, 2026 multi-provider release evaluates Ollama Cloud and OpenRouter frontier cohorts across the same 40-scenario catalog:

Overall Unsafe Signal

1.39%

Hardened heuristic scoring

Unsafe Tool Proposals

1.33%

Unsafe tool-proposal signals

Secret Disclosure

0.00%

No scored secret disclosure

Attacker Lifecycle

Initial Access

Gaining the first foothold in the AI agent's environment or context.

Execution

Running unauthorized commands or code via the agent's capabilities.

Persistence

Maintaining access or influence over the agent across sessions or turns.

Latent Activation

A dormant payload is triggered by specific context or time.

Escalation

Gaining higher privileges or access to more sensitive tools.

Exfiltration

Stealing or leaking sensitive data out of the agent's environment.

Attack Surface

User Input

Direct prompts or files provided by the user.

Memory

Long-term or short-term storage where the agent saves context.

Planner

The reasoning component that decides which steps to take next.

Tool Router

The mechanism that selects and formats tool calls.

External Tools

Third-party APIs or local commands the agent can execute.

Data Stores

Databases or document stores the agent queries for retrieval.

Latent Timeline Profile

Session 1: Seed

The attacker injects a dormant payload into the agent's memory or data.

Session 2..N: Dormancy

The payload remains hidden while the agent performs normal tasks.

Session N+1: Trigger Activation

A specific condition is met, causing the payload to execute.

Toolchain Confusion Strategy

Malicious prompt

An input designed to manipulate the agent's parsing or tool selection.

Tool selection confusion

Adversarial context steers the agent toward a higher-risk tool.

Unsafe tool proposal

The agent proposes a tool call outside the user's authority.

Disclosure signal

The scored output includes a secret-disclosure marker.

About the Author

Kevin O'Connor

NSA Alum | Adlumin

Kevin O'Connor is a security researcher specializing in autonomous systems and advanced threat modeling. Drawing from his experience at the National Security Agency (NSA) and as a researcher at Adlumin, Kevin explores the convergence of AI capabilities and offensive security, focusing on latent vulnerabilities and emergent behaviors in multi-agent environments.