Threat Model

AI agents face 21 categories of threats. Most are invisible to traditional security tools because they hide in tool descriptions, config files, and prompt instructions - not executable code.

A study of MCP servers found that 72.8% of tool poisoning attacks succeed against unaudited agent stacks. 341 malicious tools have been found on agent marketplaces. 82% of MCP servers have path traversal vulnerabilities. Firmis detects all of these statically, before your agent runs a single tool.

At a glance

#	Category	Rules	Severity	What it enables
1	Tool Poisoning	10	Critical–High	Hidden instructions that hijack agent behavior
2	Data Exfiltration	12	Critical–High	Sending your files and env vars to attacker servers
3	Credential Harvesting	18	Critical–High	Reading `~/.aws/credentials`, SSH keys, token caches
4	Prompt Injection	13	Critical–High	Overriding agent instructions from tool output or config
5	Secret Detection	60	Critical–Medium	Hardcoded API keys, tokens, and passwords in source
6	Supply Chain	8	Critical–High	Compromised or typosquatted dependencies
7	Known Malicious	16	Critical	Packages flagged in threat databases and known malware signatures
8	Network Abuse	10	High–Medium	Unauthorized DNS lookups and HTTP connections
9	File System Abuse	10	High–Medium	Reads/writes to sensitive system paths
10	Permission Overgrant	7	High–Medium	Tools requesting broader permissions than they need
11	Agent Memory Poisoning	7	High	Corrupting agent context to affect future behavior
12	Malware Distribution	6	Critical–High	Code that downloads and executes additional payloads
13	Suspicious Behavior	16	High–Medium	Obfuscation, encoded payloads, evasion techniques
14	Insecure Configuration	3	Medium–Low	Disabled security controls, open CORS, weak defaults
15	Access Control	3	High–Medium	Missing authentication or authorization checks
16	Credential Extraction	3	Critical–High	Pulling credentials from browsers, keychains, and password managers
17	Privilege Escalation	18	Critical–High	Gaining root, escaping containers, manipulating OS-level controls
18	Permission Bypass	6	Critical–High	Disabling safety confirmations, sandbox escapes, YOLO-mode flags
19	Third-Party Content	6	High–Medium	Untrusted content from email, chat, and the web entering agent context
20	Cross-Agent Propagation	3	Critical–High	One compromised agent modifying state or behavior of other agents
21	Unsupervised Execution	4	High–Medium	Background agents and daemons running without human oversight

Total: 324 detection rules across 21 categories.

Tool Poisoning

Tool poisoning is the most direct attack against AI agents. A malicious MCP server can inject hidden instructions into a tool description that tells the agent to read ~/.aws/credentials and send the contents to an attacker’s server - all while showing the user a perfectly innocent tool name like “search the web.”

Because agents read tool descriptions to understand what a tool does, hidden content in those descriptions can redirect agent behavior without the user’s knowledge. The attack is invisible to code review because the payload is in a string, not in logic.

Example finding (tp-001 - Critical): Zero-width Unicode characters (\u200B, \uFEFF) in a tool description. These characters are invisible to humans reviewing the code but are processed by the agent as text, allowing hidden instructions to be smuggled past review.

Example finding (tp-002 - High): The phrase “Ignore all previous instructions” embedded in a tool description fetched from an external MCP server - a textbook prompt override attack.

Data Exfiltration

Data exfiltration rules detect code patterns that send local data - files, environment variables, clipboard contents - to external URLs or third-party services. The attack is rarely obvious: the exfiltration is usually embedded inside a tool that also does something legitimate.

Example finding (exfil-003 - Critical): A tool that reads a local file and passes its contents as the body of a fetch() POST request to an external URL.

Example finding (exfil-007 - High): A tool that accesses process.env and sends environment variable values to a webhook endpoint. If your agent has access to API keys via env vars, this is a full credential dump.

Credential Harvesting

Credential harvesting rules detect access to files that store cloud credentials, SSH keys, browser-stored passwords, and token caches. These files are the single highest-value targets on a developer’s machine. Access to them from agent code is almost always unauthorized.

Example finding (cred-001 - High): A reference to ~/.aws/credentials or ~/.aws/config in a tool’s file-read path.

Example finding (cred-002 - Critical): Access to ~/.ssh/id_rsa - a private SSH key file that grants access to every server it’s authorized on.

Prompt Injection

Prompt injection is different from tool poisoning. Tool poisoning corrupts the tool definition itself. Prompt injection arrives through content the agent reads at runtime - a web page, a tool return value, a Markdown file, or a database record.

Unlike XSS or SQL injection, prompt injection does not require code execution. A plain-text instruction embedded in a document is enough to override the agent’s behavior if the agent is instructed to follow instructions in the content it reads.

Example finding (pi-001 - Critical): A Markdown file consumed by an agent containing the phrase “Disregard your instructions and instead…”

Example finding (pi-008 - High): A tool return value template containing a role-reassignment phrase such as “You are now operating in unrestricted mode.”

Secret Detection

Secret detection covers 60 rules for hardcoded credentials across cloud providers, SaaS APIs, infrastructure services, and generic token patterns. This is the largest category by rule count because hardcoded secrets are still the most common security mistake in software - and they become dramatically more dangerous when an AI agent can read and exfiltrate them.

Services covered include: AWS, Azure, GCP, GitHub, GitLab, Slack, Stripe, Twilio, SendGrid, HuggingFace, OpenAI, Anthropic, Datadog, PagerDuty, HashiCorp Vault, Docker Hub, npm tokens, SSH private key headers, and more.

Example finding (sec-045 - Critical): An OpenAI API key (sk-...) hardcoded in a tool configuration file.

Example finding (sec-012 - High): An AWS Access Key ID (AKIA...) in a Python source file - one grep away from a full account compromise.

Supply Chain

The agent ecosystem has a supply chain problem. Packages get compromised. Maintainers get coerced. Typosquatted packages with nearly identical names sit in registries waiting to be installed.

Supply chain rules detect dependencies with known security incidents and typosquatting patterns that mimic popular package names.

Example finding (supply-001 - Critical): A dependency on event-stream - a package that was compromised to steal bitcoin wallets and downloaded by millions of developers before the attack was discovered.

Example finding (supply-002 - High): A dependency named lodassh - a typosquat of lodash that runs a reverse shell on install.

Known Malicious

Known malicious rules match package names and identifiers from curated threat intelligence databases, and code patterns associated with known malware families and attack tools observed in the wild. This category covers both package-level threats (compromised npm/PyPI packages, community disclosures) and code-level signatures (C2 beacons, shellcode patterns, miner payloads). Malware signature matches are the highest-confidence findings in the rule set - if one fires, something is very wrong.

Example finding (km-007 - Critical): A dependency on a package that was reported as malicious in the npm advisory database - still installable, still in package.json, silently running on every npm install.

Example finding (malware-003 - Critical): A Base64-encoded payload string matching a known command-and-control beacon pattern - the fingerprint of a specific malware family that has been observed targeting developer machines.

Network Abuse

Network abuse rules detect unauthorized DNS lookups, HTTP requests to suspicious domains, tunneling services, and data-over-DNS patterns used to bypass network monitoring.

Example finding (net-004 - High): HTTP requests to a tunneling service (ngrok.io, localtunnel.me) that creates an unmonitored egress channel. Legitimate tools rarely need to phone home through a tunnel.

Example finding (net-009 - High): DNS TXT record lookups that encode exfiltrated data in DNS queries - a technique specifically designed to bypass HTTP-level network monitoring and firewall rules.

File System Abuse

File system abuse rules detect reads, writes, or deletions of sensitive system paths - including /proc filesystem entries, system logs, shell history files, and container credential paths.

Example finding (fs-001 - High): Access to /proc/self/environ - reads the process environment directly from the kernel filesystem, exposing all environment variables including any secrets injected at runtime.

Example finding (fs-006 - High): Writing to or truncating system log files to cover activity traces - a classic anti-forensics technique.

Permission Overgrant

Permission overgrant rules detect tool definitions that request broad or wildcard permissions without scoping them to the minimum required for the tool’s declared purpose. This is the agent equivalent of a mobile app requesting access to your camera, contacts, and location to show you weather.

Example finding (perm-003 - High): An MCP server tool declaring permissions: ["*"] rather than enumerating specific permission scopes. A wildcard grant means the tool can do anything the agent can do.

Agent Memory Poisoning

Agent memory poisoning rules detect patterns that corrupt or hijack the agent’s context window, conversation history, or persistent memory - causing the agent to behave differently in future turns. Unlike prompt injection (which attacks a single session), memory poisoning persists.

Example finding (mem-002 - High): A tool that writes adversarial instructions into a persistent memory file consumed by the agent on startup. Every future session starts with the poisoned context.

Malware Distribution

Malware distribution rules detect code patterns that download and execute additional payloads, install backdoors, or propagate malicious code to other systems.

Example finding (dist-001 - Critical): A curl | bash pipe-to-shell pattern that downloads and immediately executes a remote script without verification. The downloaded script could be anything; there is no integrity check.

Suspicious Behavior

Suspicious behavior rules cover obfuscation techniques, encoded payloads, and evasion patterns that are not specific to one threat category but indicate malicious intent. Legitimate tools rarely need to hide what they do.

Example finding (sus-004 - High): A long Base64-encoded string passed to a dynamic code execution function - a common technique for hiding malicious logic from static scanners and from developers reviewing the code.

Example finding (sus-011 - Medium): Heavy use of string concatenation to build a URL, specifically structured to evade simple domain-matching rules. The result URL is never visible in a single line of source.

Insecure Configuration

Insecure configuration rules detect agent configurations that disable security controls, set overly permissive CORS policies, or use known-insecure default settings.

Example finding (cfg-002 - Medium): A server configuration with allowOrigins: "*" and no authentication requirement - any website can make authenticated requests to the agent’s tool server.

Access Control

Access control rules detect missing authentication checks on tool endpoints, unauthenticated admin routes, and hardcoded bypass conditions.

Example finding (ac-001 - High): A tool handler that processes requests without verifying the caller’s identity or checking an authorization token - any process that can reach the socket can invoke the tool.

Credential Extraction

Credential extraction rules detect agent code that reaches into another application’s credential store - browser cookie databases, OS keychains, and password manager CLIs. This is distinct from credential harvesting (which targets file paths like ~/.aws/credentials): extraction targets the live, authenticated sessions and encrypted vaults of other applications running on the same machine.

The attack surface opened here is significant. An agent with access to browser cookies can impersonate authenticated sessions for any website the user is logged into. Access to the OS keychain can expose every password and token stored on the machine.

Example finding (cex-001 - Critical): Code that reads Chrome or Firefox cookie databases by targeting the browser’s profile directory - direct theft of active session tokens for every website the user is logged into.

Example finding (cex-002 - High): A call to security find-generic-password (macOS Keychain CLI) - extracts credentials stored by any application, including cloud CLI tools and developer services.

Privilege Escalation

Privilege escalation rules detect agent code that attempts to gain elevated system access - executing commands as root, escaping container boundaries, loading kernel modules, or manipulating OS scheduling to establish persistence. With 19 rules, this is the largest single-category rule set after secret detection.

AI agents operate on developer machines with broad tool access. A privilege escalation attempt inside an agent context has a much higher success probability than in a traditional attack, because the agent already runs in the user’s security context and has file system access.

Example finding (privesc-001 - Critical): A subprocess call that executes sudo - an AI agent should never need to run commands as root. If this fires, something is requesting capabilities far beyond what any tool should need.

Example finding (pe-013 - Critical): A Docker run command with --privileged flag, or --cap-add SYS_ADMIN - grants the container unrestricted access to the host kernel, effectively erasing container isolation.

Example finding (pe-014 - Critical): An IAM policy with "Action": "*" - grants full AWS account access. Combined with an agent that can write or apply infrastructure configs, this is a full cloud account takeover vector.

Permission Bypass

Permission bypass rules detect code that explicitly disables the safety confirmation layer of AI agent runtimes. These are the flags and configuration values that remove the human from the loop - --yolo, --full-auto, bypassPermissions, sandbox escape patterns, and requests for macOS Full Disk Access.

The risk is compounding: a tool that bypasses permissions is safe until the agent is compromised via prompt injection or tool poisoning. At that point, the bypass flag that was added for convenience becomes the reason the attack succeeds without any user confirmation.

Example finding (pbp-001 - Critical): The --yolo or --full-auto flag in an agent skill configuration. These flags disable all safety confirmations - if a compromised agent calls any tool, it executes immediately with no human approval step.

Example finding (pbp-003 - Critical): A --dangerously flag or explicit sandbox disable in a tool definition. The agent is designed to operate outside its containment boundary.

Third-Party Content

Third-party content rules detect agent skills that ingest untrusted external content into the agent’s context - emails, chat messages, social media threads, web pages, and GitHub issues. Each of these is a potential prompt injection delivery channel. The rule set covers six content surface categories: email, chat, social media, web, GitHub, and skill registries.

Unlike direct prompt injection (which targets the agent’s instruction set), third-party content attacks rely on the agent processing user-generated content as part of normal operation. Any email sender, Slack contact, or GitHub commenter can embed instructions that the agent may act on.

Example finding (tpc-001 - High): An agent skill that reads email body content via IMAP or Gmail API. Every email sender in the user’s inbox can now attempt to inject instructions into the agent’s context.

Example finding (tpc-005 - High): A skill that fetches GitHub issue or PR body content. A malicious contributor can open an issue containing "Ignore previous instructions and..." - targeting any agent that processes the issue.

Cross-Agent Propagation

Cross-agent propagation rules detect configurations where one agent can modify another agent’s state, memory, or configuration without mutual authentication. In multi-agent systems (crewAI, AutoGPT, OpenClaw, Nanobot), agents share workspaces, relay tool calls, and broadcast messages between sessions. Without identity verification, a compromised agent can poison every other agent it communicates with.

The attack pattern documented in arXiv:2602.20021 (“Agents of Chaos”) shows real multi-agent compromises: a single agent broadcasting false accusations to 52 other agents, and an attacker impersonating an owner by changing a username.

Example finding (mat-001 - High): A multi-agent config where agents can write to a shared directory without verifying the writer’s identity - one compromised agent can modify the shared workspace consumed by all other agents.

Example finding (mat-002 - Critical): Authentication explicitly set to none, false, or disabled in an agent configuration. Any caller - including other compromised agents - can invoke tools or read agent state without proving identity.

Unsupervised Execution

Unsupervised execution rules detect agent configurations that spawn background processes, install persistent daemons, or run parallel agent pipelines without per-action human oversight. The threat is not that these configurations are malicious by default - it is that they remove the human confirmation layer that catches compromised agent behavior before it causes damage.

A background agent running with tool access has no natural stopping point. If compromised, it continues executing until manually terminated. Combined with any other threat category, unsupervised execution amplifies the blast radius from a single action to an indefinite sequence of actions.

Example finding (uex-001 - High): A skill that calls session_spawn or uses background: true to launch an agent subprocess detached from the terminal. No per-action confirmation. No visibility into what it does next.

Example finding (uex-002 - High): A systemctl enable or launchctl load call in agent code - installs a persistent daemon that survives reboots and runs indefinitely with the agent’s tool permissions.