Security Research Benchmark
Evaluating Persistent Compromise in Autonomous AI Agents
AgentKillChain is an open, reproducible benchmark for stress-testing AI agents against latent prompt injection, memory poisoning, and toolchain confusion.
Baseline Signals
The May 20, 2026 multi-provider release evaluates Ollama Cloud and OpenRouter frontier cohorts across the same 40-scenario catalog:
Overall Unsafe Signal
1.39%
Hardened heuristic scoring
Unsafe Tool Proposals
1.33%
Unsafe tool-proposal signals
Secret Disclosure
0.00%
No scored secret disclosure
Attacker Lifecycle
Initial Access
->Gaining the first foothold in the AI agent's environment or context.
Execution
->Running unauthorized commands or code via the agent's capabilities.
Persistence
->Maintaining access or influence over the agent across sessions or turns.
Latent Activation
->A dormant payload is triggered by specific context or time.
Escalation
->Gaining higher privileges or access to more sensitive tools.
Exfiltration
Stealing or leaking sensitive data out of the agent's environment.
Attack Surface
User Input
/Direct prompts or files provided by the user.
Memory
/Long-term or short-term storage where the agent saves context.
Planner
/The reasoning component that decides which steps to take next.
Tool Router
/The mechanism that selects and formats tool calls.
External Tools
/Third-party APIs or local commands the agent can execute.
Data Stores
Databases or document stores the agent queries for retrieval.
Latent Timeline Profile
Session 1: Seed
->The attacker injects a dormant payload into the agent's memory or data.
Session 2..N: Dormancy
->The payload remains hidden while the agent performs normal tasks.
Session N+1: Trigger Activation
A specific condition is met, causing the payload to execute.
Toolchain Confusion Strategy
Malicious prompt
->An input designed to manipulate the agent's parsing or tool selection.
Tool selection confusion
->Adversarial context steers the agent toward a higher-risk tool.
Unsafe tool proposal
->The agent proposes a tool call outside the user's authority.
Disclosure signal
The scored output includes a secret-disclosure marker.
About the Author

Kevin O'Connor
NSA Alum | Adlumin
Kevin O'Connor is a security researcher specializing in autonomous systems and advanced threat modeling. Drawing from his experience at the National Security Agency (NSA) and as a researcher at Adlumin, Kevin explores the convergence of AI capabilities and offensive security, focusing on latent vulnerabilities and emergent behaviors in multi-agent environments.