Security Research Benchmark

Evaluating Persistent Compromise in Autonomous AI Agents

AgentKillChain is an open, reproducible benchmark for stress-testing AI agents against latent prompt injection, memory poisoning, and toolchain confusion.

Baseline Signals

The May 20, 2026 multi-provider release evaluates Ollama Cloud and OpenRouter frontier cohorts across the same 40-scenario catalog:

Overall Unsafe Signal

1.39%

Hardened heuristic scoring

Unsafe Tool Proposals

1.33%

Unsafe tool-proposal signals

Secret Disclosure

0.00%

No scored secret disclosure

Attacker Lifecycle

Initial Access
Gaining the first foothold in the AI agent's environment or context.
->
Execution
Running unauthorized commands or code via the agent's capabilities.
->
Persistence
Maintaining access or influence over the agent across sessions or turns.
->
Latent Activation
A dormant payload is triggered by specific context or time.
->
Escalation
Gaining higher privileges or access to more sensitive tools.
->
Exfiltration
Stealing or leaking sensitive data out of the agent's environment.

Attack Surface

User Input
Direct prompts or files provided by the user.
/
Memory
Long-term or short-term storage where the agent saves context.
/
Planner
The reasoning component that decides which steps to take next.
/
Tool Router
The mechanism that selects and formats tool calls.
/
External Tools
Third-party APIs or local commands the agent can execute.
/
Data Stores
Databases or document stores the agent queries for retrieval.

Latent Timeline Profile

Session 1: Seed
The attacker injects a dormant payload into the agent's memory or data.
->
Session 2..N: Dormancy
The payload remains hidden while the agent performs normal tasks.
->
Session N+1: Trigger Activation
A specific condition is met, causing the payload to execute.

Toolchain Confusion Strategy

Malicious prompt
An input designed to manipulate the agent's parsing or tool selection.
->
Tool selection confusion
Adversarial context steers the agent toward a higher-risk tool.
->
Unsafe tool proposal
The agent proposes a tool call outside the user's authority.
->
Disclosure signal
The scored output includes a secret-disclosure marker.

About the Author

Kevin O'Connor

Kevin O'Connor

NSA Alum | Adlumin

Kevin O'Connor is a security researcher specializing in autonomous systems and advanced threat modeling. Drawing from his experience at the National Security Agency (NSA) and as a researcher at Adlumin, Kevin explores the convergence of AI capabilities and offensive security, focusing on latent vulnerabilities and emergent behaviors in multi-agent environments.