Threat Model

AI agents face 16 categories of threats. Most are invisible to traditional security tools because they hide in tool descriptions, config files, and prompt instructions — not executable code.

A study of MCP servers found that 72.8% of tool poisoning attacks succeed against unaudited agent stacks. 341 malicious tools have been found on agent marketplaces. 82% of MCP servers have path traversal vulnerabilities. Firmis detects all of these statically, before your agent runs a single tool.

At a glance

#	Category	Rules	Severity	What it enables
1	Tool Poisoning	10	Critical–High	Hidden instructions that hijack agent behavior
2	Data Exfiltration	12	Critical–High	Sending your files and env vars to attacker servers
3	Credential Harvesting	18	Critical–High	Reading `~/.aws/credentials`, SSH keys, token caches
4	Prompt Injection	13	Critical–High	Overriding agent instructions from tool output or config
5	Secret Detection	60	Critical–Medium	Hardcoded API keys, tokens, and passwords in source
6	Supply Chain	8	Critical–High	Compromised or typosquatted dependencies
7	Malware Signatures	6	Critical	Known malicious code patterns
8	Known Malicious	10	Critical	Packages flagged in threat databases
9	Network Abuse	10	High–Medium	Unauthorized DNS lookups and HTTP connections
10	File System Abuse	10	High–Medium	Reads/writes to sensitive system paths
11	Permission Overgrant	7	High–Medium	Tools requesting broader permissions than they need
12	Agent Memory Poisoning	7	High	Corrupting agent context to affect future behavior
13	Malware Distribution	6	Critical–High	Code that downloads and executes additional payloads
14	Suspicious Behavior	16	High–Medium	Obfuscation, encoded payloads, evasion techniques
15	Insecure Configuration	3	Medium–Low	Disabled security controls, open CORS, weak defaults
16	Access Control	3	High–Medium	Missing authentication or authorization checks

Total: 209 rules across 16 categories.

Tool Poisoning

Tool poisoning is the most direct attack against AI agents. A malicious MCP server can inject hidden instructions into a tool description that tells the agent to read ~/.aws/credentials and send the contents to an attacker’s server — all while showing the user a perfectly innocent tool name like “search the web.”

Because agents read tool descriptions to understand what a tool does, hidden content in those descriptions can redirect agent behavior without the user’s knowledge. The attack is invisible to code review because the payload is in a string, not in logic.

Example finding (tp-001 — Critical): Zero-width Unicode characters (\u200B, \uFEFF) in a tool description. These characters are invisible to humans reviewing the code but are processed by the agent as text, allowing hidden instructions to be smuggled past review.

Example finding (tp-002 — High): The phrase “Ignore all previous instructions” embedded in a tool description fetched from an external MCP server — a textbook prompt override attack.

Data Exfiltration

Data exfiltration rules detect code patterns that send local data — files, environment variables, clipboard contents — to external URLs or third-party services. The attack is rarely obvious: the exfiltration is usually embedded inside a tool that also does something legitimate.

Example finding (exfil-003 — Critical): A tool that reads a local file and passes its contents as the body of a fetch() POST request to an external URL.

Example finding (exfil-007 — High): A tool that accesses process.env and sends environment variable values to a webhook endpoint. If your agent has access to API keys via env vars, this is a full credential dump.

Credential Harvesting

Credential harvesting rules detect access to files that store cloud credentials, SSH keys, browser-stored passwords, and token caches. These files are the single highest-value targets on a developer’s machine. Access to them from agent code is almost always unauthorized.

Example finding (cred-001 — High): A reference to ~/.aws/credentials or ~/.aws/config in a tool’s file-read path.

Example finding (cred-002 — Critical): Access to ~/.ssh/id_rsa — a private SSH key file that grants access to every server it’s authorized on.

Prompt Injection

Prompt injection is different from tool poisoning. Tool poisoning corrupts the tool definition itself. Prompt injection arrives through content the agent reads at runtime — a web page, a tool return value, a Markdown file, or a database record.

Unlike XSS or SQL injection, prompt injection does not require code execution. A plain-text instruction embedded in a document is enough to override the agent’s behavior if the agent is instructed to follow instructions in the content it reads.

Example finding (pi-001 — Critical): A Markdown file consumed by an agent containing the phrase “Disregard your instructions and instead…”

Example finding (pi-008 — High): A tool return value template containing a role-reassignment phrase such as “You are now operating in unrestricted mode.”

Secret Detection

Secret detection covers 60 rules for hardcoded credentials across cloud providers, SaaS APIs, infrastructure services, and generic token patterns. This is the largest category by rule count because hardcoded secrets are still the most common security mistake in software — and they become dramatically more dangerous when an AI agent can read and exfiltrate them.

Services covered include: AWS, Azure, GCP, GitHub, GitLab, Slack, Stripe, Twilio, SendGrid, HuggingFace, OpenAI, Anthropic, Datadog, PagerDuty, HashiCorp Vault, Docker Hub, npm tokens, SSH private key headers, and more.

Example finding (sec-045 — Critical): An OpenAI API key (sk-...) hardcoded in a tool configuration file.

Example finding (sec-012 — High): An AWS Access Key ID (AKIA...) in a Python source file — one grep away from a full account compromise.

Supply Chain

The agent ecosystem has a supply chain problem. Packages get compromised. Maintainers get coerced. Typosquatted packages with nearly identical names sit in registries waiting to be installed.

Supply chain rules detect dependencies with known security incidents and typosquatting patterns that mimic popular package names.

Example finding (supply-001 — Critical): A dependency on event-stream — a package that was compromised to steal bitcoin wallets and downloaded by millions of developers before the attack was discovered.

Example finding (supply-002 — High): A dependency named lodassh — a typosquat of lodash that runs a reverse shell on install.

Malware Signatures

Malware signature rules match code patterns associated with known malware families and attack tools observed in the wild. These are the highest-confidence findings in the rule set. If one fires, something is very wrong.

Example finding (mal-003 — Critical): A Base64-encoded payload string matching a known command-and-control beacon pattern — the fingerprint of a specific malware family that has been observed targeting developer machines.

Known Malicious

Known malicious rules match package names and identifiers from curated threat intelligence databases, including packages flagged by npm security teams and community disclosures.

Example finding (km-007 — Critical): A dependency on a package that was reported as malicious in the npm advisory database — still installable, still in package.json, silently running on every npm install.

Network Abuse

Network abuse rules detect unauthorized DNS lookups, HTTP requests to suspicious domains, tunneling services, and data-over-DNS patterns used to bypass network monitoring.

Example finding (net-004 — High): HTTP requests to a tunneling service (ngrok.io, localtunnel.me) that creates an unmonitored egress channel. Legitimate tools rarely need to phone home through a tunnel.

Example finding (net-009 — High): DNS TXT record lookups that encode exfiltrated data in DNS queries — a technique specifically designed to bypass HTTP-level network monitoring and firewall rules.

File System Abuse

File system abuse rules detect reads, writes, or deletions of sensitive system paths — including /proc filesystem entries, system logs, shell history files, and container credential paths.

Example finding (fs-001 — High): Access to /proc/self/environ — reads the process environment directly from the kernel filesystem, exposing all environment variables including any secrets injected at runtime.

Example finding (fs-006 — High): Writing to or truncating system log files to cover activity traces — a classic anti-forensics technique.

Permission Overgrant

Permission overgrant rules detect tool definitions that request broad or wildcard permissions without scoping them to the minimum required for the tool’s declared purpose. This is the agent equivalent of a mobile app requesting access to your camera, contacts, and location to show you weather.

Example finding (perm-003 — High): An MCP server tool declaring permissions: ["*"] rather than enumerating specific permission scopes. A wildcard grant means the tool can do anything the agent can do.

Agent Memory Poisoning

Agent memory poisoning rules detect patterns that corrupt or hijack the agent’s context window, conversation history, or persistent memory — causing the agent to behave differently in future turns. Unlike prompt injection (which attacks a single session), memory poisoning persists.

Example finding (mem-002 — High): A tool that writes adversarial instructions into a persistent memory file consumed by the agent on startup. Every future session starts with the poisoned context.

Malware Distribution

Malware distribution rules detect code patterns that download and execute additional payloads, install backdoors, or propagate malicious code to other systems.

Example finding (dist-001 — Critical): A curl | bash pipe-to-shell pattern that downloads and immediately executes a remote script without verification. The downloaded script could be anything; there is no integrity check.

Suspicious Behavior

Suspicious behavior rules cover obfuscation techniques, encoded payloads, and evasion patterns that are not specific to one threat category but indicate malicious intent. Legitimate tools rarely need to hide what they do.

Example finding (sus-004 — High): A long Base64-encoded string passed to a dynamic code execution function — a common technique for hiding malicious logic from static scanners and from developers reviewing the code.

Example finding (sus-011 — Medium): Heavy use of string concatenation to build a URL, specifically structured to evade simple domain-matching rules. The result URL is never visible in a single line of source.

Insecure Configuration

Insecure configuration rules detect agent configurations that disable security controls, set overly permissive CORS policies, or use known-insecure default settings.

Example finding (cfg-002 — Medium): A server configuration with allowOrigins: "*" and no authentication requirement — any website can make authenticated requests to the agent’s tool server.

Access Control

Access control rules detect missing authentication checks on tool endpoints, unauthenticated admin routes, and hardcoded bypass conditions.

Example finding (ac-001 — High): A tool handler that processes requests without verifying the caller’s identity or checking an authorization token — any process that can reach the socket can invoke the tool.