Evaluation Framework

The AgentKillChain harness is a Python-based execution environment for controlled tests of persistent context compromise in LLM agents.

Agent Testing Environment

The current framework evaluates one representative assistant loop while keeping the provider interface flexible. It manages LLM interactions through OpenRouter, local Ollama, or authenticated Ollama Cloud to run tests across model catalogs.

Latent State Propagation

Unlike prompt injections that trigger immediately, AgentKillChain models persistent context-risk patterns. Tests are structured into 'campaigns' across multiple controlled sessions. The framework tests whether an agent carries poisoned memory from an initial benign-looking interaction into a future sensitive task.

Automated Grading & Heuristics

After an agent responds or proposes a tool call, the framework scores unsafe behavior signals. The May 20, 2026 baselines use deterministic heuristics; LLM-as-judge scoring remains available for configurations that explicitly enable it.

Reproducible Reporting

The harness emits structured JSON and CSV artifacts at the end of every run. Public artifacts keep attack prompts public while redacting full live request and response traces, retaining metrics, flags, counts, character lengths, and SHA-256 digests for traceability.