Threat Model
AI agents face 21 categories of threats. Most are invisible to traditional security tools because they hide in tool descriptions, config files, and prompt instructions - not executable code.
A study of MCP servers found that 72.8% of tool poisoning attacks succeed against unaudited agent stacks. 341 malicious tools have been found on agent marketplaces. 82% of MCP servers have path traversal vulnerabilities. Firmis detects all of these statically, before your agent runs a single tool.
At a glance
Section titled “At a glance”| # | Category | Rules | Severity | What it enables |
|---|---|---|---|---|
| 1 | Tool Poisoning | 10 | Critical–High | Hidden instructions that hijack agent behavior |
| 2 | Data Exfiltration | 12 | Critical–High | Sending your files and env vars to attacker servers |
| 3 | Credential Harvesting | 18 | Critical–High | Reading ~/.aws/credentials, SSH keys, token caches |
| 4 | Prompt Injection | 13 | Critical–High | Overriding agent instructions from tool output or config |
| 5 | Secret Detection | 60 | Critical–Medium | Hardcoded API keys, tokens, and passwords in source |
| 6 | Supply Chain | 8 | Critical–High | Compromised or typosquatted dependencies |
| 7 | Known Malicious | 16 | Critical | Packages flagged in threat databases and known malware signatures |
| 8 | Network Abuse | 10 | High–Medium | Unauthorized DNS lookups and HTTP connections |
| 9 | File System Abuse | 10 | High–Medium | Reads/writes to sensitive system paths |
| 10 | Permission Overgrant | 7 | High–Medium | Tools requesting broader permissions than they need |
| 11 | Agent Memory Poisoning | 7 | High | Corrupting agent context to affect future behavior |
| 12 | Malware Distribution | 6 | Critical–High | Code that downloads and executes additional payloads |
| 13 | Suspicious Behavior | 16 | High–Medium | Obfuscation, encoded payloads, evasion techniques |
| 14 | Insecure Configuration | 3 | Medium–Low | Disabled security controls, open CORS, weak defaults |
| 15 | Access Control | 3 | High–Medium | Missing authentication or authorization checks |
| 16 | Credential Extraction | 3 | Critical–High | Pulling credentials from browsers, keychains, and password managers |
| 17 | Privilege Escalation | 18 | Critical–High | Gaining root, escaping containers, manipulating OS-level controls |
| 18 | Permission Bypass | 6 | Critical–High | Disabling safety confirmations, sandbox escapes, YOLO-mode flags |
| 19 | Third-Party Content | 6 | High–Medium | Untrusted content from email, chat, and the web entering agent context |
| 20 | Cross-Agent Propagation | 3 | Critical–High | One compromised agent modifying state or behavior of other agents |
| 21 | Unsupervised Execution | 4 | High–Medium | Background agents and daemons running without human oversight |
Total: 324 detection rules across 21 categories.
Tool Poisoning
Section titled “Tool Poisoning”Tool poisoning is the most direct attack against AI agents. A malicious MCP server can inject hidden instructions into a tool description that tells the agent to read ~/.aws/credentials and send the contents to an attacker’s server - all while showing the user a perfectly innocent tool name like “search the web.”
Because agents read tool descriptions to understand what a tool does, hidden content in those descriptions can redirect agent behavior without the user’s knowledge. The attack is invisible to code review because the payload is in a string, not in logic.
Example finding (tp-001 - Critical): Zero-width Unicode characters (\u200B, \uFEFF) in a tool description. These characters are invisible to humans reviewing the code but are processed by the agent as text, allowing hidden instructions to be smuggled past review.
Example finding (tp-002 - High): The phrase “Ignore all previous instructions” embedded in a tool description fetched from an external MCP server - a textbook prompt override attack.
Data Exfiltration
Section titled “Data Exfiltration”Data exfiltration rules detect code patterns that send local data - files, environment variables, clipboard contents - to external URLs or third-party services. The attack is rarely obvious: the exfiltration is usually embedded inside a tool that also does something legitimate.
Example finding (exfil-003 - Critical): A tool that reads a local file and passes its contents as the body of a fetch() POST request to an external URL.
Example finding (exfil-007 - High): A tool that accesses process.env and sends environment variable values to a webhook endpoint. If your agent has access to API keys via env vars, this is a full credential dump.
Credential Harvesting
Section titled “Credential Harvesting”Credential harvesting rules detect access to files that store cloud credentials, SSH keys, browser-stored passwords, and token caches. These files are the single highest-value targets on a developer’s machine. Access to them from agent code is almost always unauthorized.
Example finding (cred-001 - High): A reference to ~/.aws/credentials or ~/.aws/config in a tool’s file-read path.
Example finding (cred-002 - Critical): Access to ~/.ssh/id_rsa - a private SSH key file that grants access to every server it’s authorized on.
Prompt Injection
Section titled “Prompt Injection”Prompt injection is different from tool poisoning. Tool poisoning corrupts the tool definition itself. Prompt injection arrives through content the agent reads at runtime - a web page, a tool return value, a Markdown file, or a database record.
Unlike XSS or SQL injection, prompt injection does not require code execution. A plain-text instruction embedded in a document is enough to override the agent’s behavior if the agent is instructed to follow instructions in the content it reads.
Example finding (pi-001 - Critical): A Markdown file consumed by an agent containing the phrase “Disregard your instructions and instead…”
Example finding (pi-008 - High): A tool return value template containing a role-reassignment phrase such as “You are now operating in unrestricted mode.”
Secret Detection
Section titled “Secret Detection”Secret detection covers 60 rules for hardcoded credentials across cloud providers, SaaS APIs, infrastructure services, and generic token patterns. This is the largest category by rule count because hardcoded secrets are still the most common security mistake in software - and they become dramatically more dangerous when an AI agent can read and exfiltrate them.
Services covered include: AWS, Azure, GCP, GitHub, GitLab, Slack, Stripe, Twilio, SendGrid, HuggingFace, OpenAI, Anthropic, Datadog, PagerDuty, HashiCorp Vault, Docker Hub, npm tokens, SSH private key headers, and more.
Example finding (sec-045 - Critical): An OpenAI API key (sk-...) hardcoded in a tool configuration file.
Example finding (sec-012 - High): An AWS Access Key ID (AKIA...) in a Python source file - one grep away from a full account compromise.
Supply Chain
Section titled “Supply Chain”The agent ecosystem has a supply chain problem. Packages get compromised. Maintainers get coerced. Typosquatted packages with nearly identical names sit in registries waiting to be installed.
Supply chain rules detect dependencies with known security incidents and typosquatting patterns that mimic popular package names.
Example finding (supply-001 - Critical): A dependency on event-stream - a package that was compromised to steal bitcoin wallets and downloaded by millions of developers before the attack was discovered.
Example finding (supply-002 - High): A dependency named lodassh - a typosquat of lodash that runs a reverse shell on install.
Known Malicious
Section titled “Known Malicious”Known malicious rules match package names and identifiers from curated threat intelligence databases, and code patterns associated with known malware families and attack tools observed in the wild. This category covers both package-level threats (compromised npm/PyPI packages, community disclosures) and code-level signatures (C2 beacons, shellcode patterns, miner payloads). Malware signature matches are the highest-confidence findings in the rule set - if one fires, something is very wrong.
Example finding (km-007 - Critical): A dependency on a package that was reported as malicious in the npm advisory database - still installable, still in package.json, silently running on every npm install.
Example finding (malware-003 - Critical): A Base64-encoded payload string matching a known command-and-control beacon pattern - the fingerprint of a specific malware family that has been observed targeting developer machines.
Network Abuse
Section titled “Network Abuse”Network abuse rules detect unauthorized DNS lookups, HTTP requests to suspicious domains, tunneling services, and data-over-DNS patterns used to bypass network monitoring.
Example finding (net-004 - High): HTTP requests to a tunneling service (ngrok.io, localtunnel.me) that creates an unmonitored egress channel. Legitimate tools rarely need to phone home through a tunnel.
Example finding (net-009 - High): DNS TXT record lookups that encode exfiltrated data in DNS queries - a technique specifically designed to bypass HTTP-level network monitoring and firewall rules.
File System Abuse
Section titled “File System Abuse”File system abuse rules detect reads, writes, or deletions of sensitive system paths - including /proc filesystem entries, system logs, shell history files, and container credential paths.
Example finding (fs-001 - High): Access to /proc/self/environ - reads the process environment directly from the kernel filesystem, exposing all environment variables including any secrets injected at runtime.
Example finding (fs-006 - High): Writing to or truncating system log files to cover activity traces - a classic anti-forensics technique.
Permission Overgrant
Section titled “Permission Overgrant”Permission overgrant rules detect tool definitions that request broad or wildcard permissions without scoping them to the minimum required for the tool’s declared purpose. This is the agent equivalent of a mobile app requesting access to your camera, contacts, and location to show you weather.
Example finding (perm-003 - High): An MCP server tool declaring permissions: ["*"] rather than enumerating specific permission scopes. A wildcard grant means the tool can do anything the agent can do.
Agent Memory Poisoning
Section titled “Agent Memory Poisoning”Agent memory poisoning rules detect patterns that corrupt or hijack the agent’s context window, conversation history, or persistent memory - causing the agent to behave differently in future turns. Unlike prompt injection (which attacks a single session), memory poisoning persists.
Example finding (mem-002 - High): A tool that writes adversarial instructions into a persistent memory file consumed by the agent on startup. Every future session starts with the poisoned context.
Malware Distribution
Section titled “Malware Distribution”Malware distribution rules detect code patterns that download and execute additional payloads, install backdoors, or propagate malicious code to other systems.
Example finding (dist-001 - Critical): A curl | bash pipe-to-shell pattern that downloads and immediately executes a remote script without verification. The downloaded script could be anything; there is no integrity check.
Suspicious Behavior
Section titled “Suspicious Behavior”Suspicious behavior rules cover obfuscation techniques, encoded payloads, and evasion patterns that are not specific to one threat category but indicate malicious intent. Legitimate tools rarely need to hide what they do.
Example finding (sus-004 - High): A long Base64-encoded string passed to a dynamic code execution function - a common technique for hiding malicious logic from static scanners and from developers reviewing the code.
Example finding (sus-011 - Medium): Heavy use of string concatenation to build a URL, specifically structured to evade simple domain-matching rules. The result URL is never visible in a single line of source.
Insecure Configuration
Section titled “Insecure Configuration”Insecure configuration rules detect agent configurations that disable security controls, set overly permissive CORS policies, or use known-insecure default settings.
Example finding (cfg-002 - Medium): A server configuration with allowOrigins: "*" and no authentication requirement - any website can make authenticated requests to the agent’s tool server.
Access Control
Section titled “Access Control”Access control rules detect missing authentication checks on tool endpoints, unauthenticated admin routes, and hardcoded bypass conditions.
Example finding (ac-001 - High): A tool handler that processes requests without verifying the caller’s identity or checking an authorization token - any process that can reach the socket can invoke the tool.
Credential Extraction
Section titled “Credential Extraction”Credential extraction rules detect agent code that reaches into another application’s credential store - browser cookie databases, OS keychains, and password manager CLIs. This is distinct from credential harvesting (which targets file paths like ~/.aws/credentials): extraction targets the live, authenticated sessions and encrypted vaults of other applications running on the same machine.
The attack surface opened here is significant. An agent with access to browser cookies can impersonate authenticated sessions for any website the user is logged into. Access to the OS keychain can expose every password and token stored on the machine.
Example finding (cex-001 - Critical): Code that reads Chrome or Firefox cookie databases by targeting the browser’s profile directory - direct theft of active session tokens for every website the user is logged into.
Example finding (cex-002 - High): A call to security find-generic-password (macOS Keychain CLI) - extracts credentials stored by any application, including cloud CLI tools and developer services.
Privilege Escalation
Section titled “Privilege Escalation”Privilege escalation rules detect agent code that attempts to gain elevated system access - executing commands as root, escaping container boundaries, loading kernel modules, or manipulating OS scheduling to establish persistence. With 19 rules, this is the largest single-category rule set after secret detection.
AI agents operate on developer machines with broad tool access. A privilege escalation attempt inside an agent context has a much higher success probability than in a traditional attack, because the agent already runs in the user’s security context and has file system access.
Example finding (privesc-001 - Critical): A subprocess call that executes sudo - an AI agent should never need to run commands as root. If this fires, something is requesting capabilities far beyond what any tool should need.
Example finding (pe-013 - Critical): A Docker run command with --privileged flag, or --cap-add SYS_ADMIN - grants the container unrestricted access to the host kernel, effectively erasing container isolation.
Example finding (pe-014 - Critical): An IAM policy with "Action": "*" - grants full AWS account access. Combined with an agent that can write or apply infrastructure configs, this is a full cloud account takeover vector.
Permission Bypass
Section titled “Permission Bypass”Permission bypass rules detect code that explicitly disables the safety confirmation layer of AI agent runtimes. These are the flags and configuration values that remove the human from the loop - --yolo, --full-auto, bypassPermissions, sandbox escape patterns, and requests for macOS Full Disk Access.
The risk is compounding: a tool that bypasses permissions is safe until the agent is compromised via prompt injection or tool poisoning. At that point, the bypass flag that was added for convenience becomes the reason the attack succeeds without any user confirmation.
Example finding (pbp-001 - Critical): The --yolo or --full-auto flag in an agent skill configuration. These flags disable all safety confirmations - if a compromised agent calls any tool, it executes immediately with no human approval step.
Example finding (pbp-003 - Critical): A --dangerously flag or explicit sandbox disable in a tool definition. The agent is designed to operate outside its containment boundary.
Third-Party Content
Section titled “Third-Party Content”Third-party content rules detect agent skills that ingest untrusted external content into the agent’s context - emails, chat messages, social media threads, web pages, and GitHub issues. Each of these is a potential prompt injection delivery channel. The rule set covers six content surface categories: email, chat, social media, web, GitHub, and skill registries.
Unlike direct prompt injection (which targets the agent’s instruction set), third-party content attacks rely on the agent processing user-generated content as part of normal operation. Any email sender, Slack contact, or GitHub commenter can embed instructions that the agent may act on.
Example finding (tpc-001 - High): An agent skill that reads email body content via IMAP or Gmail API. Every email sender in the user’s inbox can now attempt to inject instructions into the agent’s context.
Example finding (tpc-005 - High): A skill that fetches GitHub issue or PR body content. A malicious contributor can open an issue containing "Ignore previous instructions and..." - targeting any agent that processes the issue.
Cross-Agent Propagation
Section titled “Cross-Agent Propagation”Cross-agent propagation rules detect configurations where one agent can modify another agent’s state, memory, or configuration without mutual authentication. In multi-agent systems (crewAI, AutoGPT, OpenClaw, Nanobot), agents share workspaces, relay tool calls, and broadcast messages between sessions. Without identity verification, a compromised agent can poison every other agent it communicates with.
The attack pattern documented in arXiv:2602.20021 (“Agents of Chaos”) shows real multi-agent compromises: a single agent broadcasting false accusations to 52 other agents, and an attacker impersonating an owner by changing a username.
Example finding (mat-001 - High): A multi-agent config where agents can write to a shared directory without verifying the writer’s identity - one compromised agent can modify the shared workspace consumed by all other agents.
Example finding (mat-002 - Critical): Authentication explicitly set to none, false, or disabled in an agent configuration. Any caller - including other compromised agents - can invoke tools or read agent state without proving identity.
Unsupervised Execution
Section titled “Unsupervised Execution”Unsupervised execution rules detect agent configurations that spawn background processes, install persistent daemons, or run parallel agent pipelines without per-action human oversight. The threat is not that these configurations are malicious by default - it is that they remove the human confirmation layer that catches compromised agent behavior before it causes damage.
A background agent running with tool access has no natural stopping point. If compromised, it continues executing until manually terminated. Combined with any other threat category, unsupervised execution amplifies the blast radius from a single action to an indefinite sequence of actions.
Example finding (uex-001 - High): A skill that calls session_spawn or uses background: true to launch an agent subprocess detached from the terminal. No per-action confirmation. No visibility into what it does next.
Example finding (uex-002 - High): A systemctl enable or launchctl load call in agent code - installs a persistent daemon that survives reboots and runs indefinitely with the agent’s tool permissions.
What to read next
Section titled “What to read next”- Detection Engine - how rules are evaluated, scored, and why Firmis keeps false positive rates low
- Built-in Rules - full list of all 324 detection rules with IDs and descriptions
- Ignoring Findings - suppress false positives per file or rule without disabling the entire category
- firmis scan - CLI reference and severity filtering flags