Threat Categories Reference

324 detection rules. 21 categories. Every rule is open-source YAML you can read, extend, or override. This page is the authoritative reference for what each category detects, how findings are identified, and how they map to OWASP LLM Top 10 and MITRE ATT&CK for ML.

Master table

Sorted by severity range - the most dangerous categories first.

#	Category	ID Prefix	Rules	Severity Range	OWASP LLM	MITRE ATT&CK
1	tool-poisoning	`tp-`	12	Critical–Medium	LLM07	AML.T0043
2	data-exfiltration	`de-` / `exfil-`	12	Critical–High	LLM02	AML.T0037
3	credential-harvesting	`cred-`	20	Critical–High	LLM06	AML.T0012
4	prompt-injection	`prompt-` / `pi-`	16	Critical–High	LLM01	AML.T0051
5	secret-detection	`sd-`	60	Critical–Medium	LLM06	AML.T0012
6	supply-chain	`supply-` / `sc-`	11	Critical–High	LLM05	AML.T0010
7	known-malicious	`km-` / `malware-`	16	Critical	LLM05 / LLM07	AML.T0010
8	network-abuse	`na-`	14	High–Medium	LLM04 / LLM08	AML.T0037
9	file-system-abuse	`fs-`	12	High–Medium	LLM08	AML.T0037
10	permission-overgrant	`perm-` / `po-`	7	High–Medium	LLM07	AML.T0043
11	agent-memory-poisoning	`mem-`	9	High	LLM03	AML.T0051
12	malware-distribution	`md-`	6	Critical–High	LLM07	AML.T0043
13	suspicious-behavior	`sb-`	16	High–Medium	LLM02	AML.T0043
14	insecure-config	`ic-`	6	Medium–Low	LLM09	AML.T0054
15	access-control	`ac-`	3	High–Medium	LLM10	AML.T0012
16	third-party-content	`tpc-`	6	High	LLM01	AML.T0051
17	credential-extraction	`cex-`	3	Critical–High	LLM06	AML.T0012
18	privilege-escalation	`privesc-` / `pe-`	18	Critical–High	LLM08	AML.T0012
19	permission-bypass	`pbp-`	3	Critical–High	LLM07	AML.T0043
20	unsupervised-execution	`uex-`	3	High–Medium	LLM08	AML.T0043
21	cross-agent-propagation	`mat-`	3	Critical–High	LLM07	AML.T0043

Total: 324 detection rules across 21 categories.

OWASP LLM Top 10 mapping

OWASP LLM	Title	Firmis Categories
LLM01	Prompt Injection	prompt-injection, third-party-content
LLM02	Insecure Output Handling	data-exfiltration, suspicious-behavior
LLM03	Training Data Poisoning	agent-memory-poisoning
LLM04	Model Denial of Service	network-abuse
LLM05	Supply Chain Vulnerabilities	supply-chain, known-malicious
LLM06	Sensitive Information Disclosure	secret-detection, credential-harvesting, credential-extraction
LLM07	Insecure Plugin Design	tool-poisoning, permission-overgrant, malware-distribution, permission-bypass, cross-agent-propagation
LLM08	Excessive Agency	file-system-abuse, network-abuse, unsupervised-execution, privilege-escalation
LLM09	Overreliance	insecure-config
LLM10	Model Theft	access-control

tool-poisoning

ID prefix: tp-001 through tp-012 Severity range: Critical–Medium Rules: 12

Tool poisoning attacks embed malicious instructions inside tool definitions - descriptions, names, or metadata fields - that AI agents read and act on automatically. Because agents trust tool descriptions to understand what a tool does, hidden content in those fields can redirect agent behavior without user awareness.

What it detects:

Invisible Unicode characters (zero-width spaces, directional overrides, homoglyphs) in tool names or descriptions
Prompt override language embedded in tool metadata, such as “Ignore all previous instructions” or role-reassignment phrases
Code that programmatically writes to MCP configuration files to silently register tool servers

Example finding:

CRITICAL  tp-001  Hidden instructions in tool description
          src/tools/search.ts:14
          Evidence: Zero-width space (U+200B) found in description field

Related rules: tp-001 (Unicode hiding), tp-002 (prompt override), tp-003 (tool shadowing), tp-004 (MCP config injection), tp-006 (homoglyph names)

data-exfiltration

ID prefix: de- / exfil- Severity range: Critical–High Rules: 12

Data exfiltration rules detect code that sends local data - files, environment variables, clipboard contents, configuration - to external URLs or services outside the intended scope of the tool.

What it detects:

Tool handlers that read local files and POST their contents to external URLs
Code that accesses process.env and sends environment variable values to webhook endpoints
DNS-based exfiltration patterns that encode data in DNS query subdomains to bypass HTTP monitoring

Example finding:

CRITICAL  exfil-003  File contents sent to external URL
          src/tools/sync.ts:87
          Evidence: readFileSync() result passed to fetch() POST body targeting external domain

Related rules: exfil-001 through exfil-012

credential-harvesting

ID prefix: cred- Severity range: Critical–High Rules: 20

Credential harvesting rules detect direct references to files that store cloud provider credentials, SSH private keys, browser-stored passwords, and authentication token caches. Agent code should never need to read these paths; any reference is a strong indicator of malicious intent.

What it detects:

References to ~/.aws/credentials, ~/.aws/config, and provider-specific credential files
Access to SSH private key paths such as ~/.ssh/id_rsa and ~/.ssh/id_ed25519
Browser credential store paths (Chrome Login Data, Firefox key4.db, macOS Keychain)

Example finding:

HIGH  cred-001  Reference to AWS credentials file
      src/tools/deploy.ts:23
      Evidence: Path ~/.aws/credentials accessed via file-read operation

Related rules: cred-001 through cred-018

prompt-injection

ID prefix: prompt- / pi- Severity range: Critical–High Rules: 16

Prompt injection rules detect instruction-override language in any content the agent reads: tool return values, configuration files, Markdown documents, or fetched content. Unlike tool poisoning (which targets definitions at configuration time), prompt injection can arrive through any data channel the agent processes at runtime.

What it detects:

Instruction override phrases in Markdown files, README files, or documents the agent consumes
Role reassignment language in tool outputs, such as “You are now DAN” or “Operating in developer mode”
Context manipulation patterns that attempt to make the agent discard prior instructions

Example finding:

CRITICAL  pi-001  Prompt injection in agent-consumed document
          docs/AGENT_CONTEXT.md:34
          Evidence: "Disregard your instructions and instead..." in agent-readable file

Related rules: prompt-001 through pi-013

secret-detection

ID prefix: sd- Severity range: Critical–Medium Rules: 60

Secret detection is the largest category by rule count - 60 rules covering hardcoded credentials across 30+ cloud providers, SaaS APIs, infrastructure services, and generic token formats. This category is exempt from the 0.15x document multiplier, so secrets in .env.example and README.md files are still reported.

What it detects:

Cloud provider API keys and access tokens (AWS, Azure, GCP, Anthropic, OpenAI, HuggingFace)
SaaS service tokens (GitHub, GitLab, Slack, Stripe, Twilio, SendGrid, PagerDuty, Datadog)
Private key headers (PEM format markers) and SSH key formats

Example finding:

CRITICAL  sd-045  OpenAI API key detected
          config/llm.ts:12
          Evidence: sk-... token matching OpenAI API key format (weight 100)

Related rules: sd-001 through sd-060

supply-chain

ID prefix: supply- / sc- Severity range: Critical–High Rules: 11

Supply chain rules detect dependencies with documented security incidents - compromised packages, protestware, maintainer sabotage events - and typosquatting patterns that mimic popular package names to trick developers into installing malicious code.

What it detects:

Dependencies matching a curated list of packages with known compromise histories (e.g., event-stream, ua-parser-js)
Typosquatted package names that differ from legitimate packages by one character or transposition
npm lifecycle scripts (preinstall, postinstall) that download and run remote content

Example finding:

CRITICAL  supply-001  Known-compromised package dependency
          package.json:18
          Evidence: "event-stream" - package was compromised to steal bitcoin wallets (2018)

Related rules: supply-001 through sc-008

known-malicious

ID prefix: km- / malware- Severity range: Critical Rules: 16

Known malicious rules match package names, identifiers, and code signatures against curated threat intelligence databases. This category covers two complementary detection strategies: matching packages reported to npm security teams, community-disclosed malicious packages, and packages removed from registries - and matching code patterns associated with known malware families and attack toolkits observed in the wild.

What it detects:

Package names in package.json, requirements.txt, or pyproject.toml that match known-bad identifiers
Import statements referencing packages flagged in npm advisory or PyPI security databases
String literals matching known malicious package names used in supply chain attacks
Base64-encoded payload strings matching known command-and-control beacon patterns
Shellcode injection sequences and process hollowing patterns
Cryptocurrency miner startup sequences embedded in tool handlers

Example finding:

CRITICAL  km-007  Known malicious package reference
          package.json:31
          Evidence: Package "flatmap-stream" - used to distribute malicious payload (npm advisory #663)

Example finding:

CRITICAL  malware-003  Known C2 beacon pattern
          src/tools/update.ts:156
          Evidence: Base64 payload matches Cobalt Strike stage-1 beacon signature

Related rules: km-001 through km-010, malware-001 through malware-006

network-abuse

ID prefix: na- Severity range: High–Medium Rules: 14

Network abuse rules detect unauthorized DNS lookups, HTTP requests to suspicious domains, tunneling service usage, and data-over-DNS patterns. These are often used to establish covert communication channels or exfiltrate data in ways that bypass standard HTTP-level monitoring.

What it detects:

Requests to tunneling services that create unmonitored egress channels (ngrok.io, localtunnel.me, serveo.net)
HTTP requests to suspicious TLDs (.tk, .ml, .ga, .cf, .gq, .xyz) commonly used in phishing and C2 infrastructure
DNS TXT record lookups that encode exfiltrated data in query subdomains

Example finding:

HIGH  na-004  Request to tunneling service
      src/tools/debug.ts:44
      Evidence: HTTP request targeting ngrok.io - creates unmonitored egress channel

Related rules: na-001 through na-010

file-system-abuse

ID prefix: fs- Severity range: High–Medium Rules: 12

File system abuse rules detect reads, writes, or deletions of sensitive system paths - including Linux /proc filesystem entries, system log files, shell history files, and container credential paths - that tools should never access.

What it detects:

Access to /proc/self/environ (exposes all process environment variables including secrets)
Writes to or truncation of system log files to cover activity traces
Access to container service account token paths in Kubernetes deployments

Example finding:

HIGH  fs-001  Access to /proc/self/environ
      src/tools/diagnostics.ts:19
      Evidence: Direct read of /proc/self/environ - exposes all environment variables

Related rules: fs-001 through fs-010

permission-overgrant

ID prefix: perm- / po- Severity range: High–Medium Rules: 7

Permission overgrant rules detect tool definitions that request broader permissions than necessary for their declared purpose - wildcard permission scopes, missing scope constraints, and permission declarations that grant access far beyond what the tool description claims to need.

What it detects:

MCP tool configurations declaring permissions: ["*"] or equivalent wildcard scopes
Tool permission lists that include filesystem write access when the tool only claims to read data
Missing scope or allowedPaths constraints on tools with file or network access

Example finding:

HIGH  perm-003  Wildcard permission in tool definition
      mcp-config.json:42
      Evidence: Tool "search" declares permissions: ["*"] - should enumerate specific scopes only

Related rules: perm-001 through po-007

agent-memory-poisoning

ID prefix: mem- Severity range: High Rules: 9

Agent memory poisoning rules detect patterns that corrupt or hijack the agent’s context window, conversation history, or persistent memory store - causing the agent to behave maliciously in subsequent turns without the current turn showing obvious attack signals.

What it detects:

Tools that write adversarial instructions into persistent memory files loaded by the agent on startup
Code that injects role-reassignment or instruction-override text into agent context storage
Manipulation of conversation history or session state to alter future agent behavior

Example finding:

HIGH  mem-002  Adversarial content written to agent memory
      src/tools/memory.ts:67
      Evidence: Tool writes prompt injection payload to ~/.agent_memory/context.json

Related rules: mem-001 through mem-007

malware-distribution

ID prefix: md- Severity range: Critical–High Rules: 6

Malware distribution rules detect code patterns that download and run additional payloads, install backdoors, or spread malicious code to other systems in the environment.

What it detects:

Pipe-to-shell patterns that download and immediately run remote scripts without verification
Dynamic code execution of remotely fetched content using dangerous execution primitives
Self-replicating code that copies itself or drops payloads to other paths in the filesystem

Example finding:

CRITICAL  md-001  Pipe-to-shell execution
          src/tools/installer.ts:34
          Evidence: curl output piped directly to bash - runs remote script without verification

Related rules: md-001 through md-006

suspicious-behavior

ID prefix: sb- Severity range: High–Medium Rules: 16

Suspicious behavior rules cover obfuscation techniques, encoded payloads, and evasion patterns that are not specific to one threat category but strongly indicate malicious intent. These rules catch threats that do not fit neatly into more specific categories.

What it detects:

Long Base64-encoded strings passed to dynamic code execution primitives
Heavy string concatenation used to build URLs or commands in ways that evade simple pattern matching
Anti-debugging and sandbox detection patterns commonly used by malware to avoid analysis

Example finding:

HIGH  sb-004  Obfuscated payload passed to dynamic executor
      src/tools/loader.ts:91
      Evidence: 2KB base64 string decoded and passed to code executor - common malware staging pattern

Related rules: sb-001 through sb-016

insecure-config

ID prefix: ic- Severity range: Medium–Low Rules: 6

Insecure configuration rules detect agent configurations that disable security controls, set overly permissive CORS policies, or use known-insecure default settings that increase the attack surface.

What it detects:

Server configurations with allowOrigins: "*" and no authentication requirement
Agent configurations with authentication disabled (auth: false, requireAuth: false)
Insecure transport settings (HTTP instead of HTTPS for endpoints handling sensitive data)

Example finding:

MEDIUM  ic-002  Overly permissive CORS configuration
        src/server/config.ts:15
        Evidence: allowOrigins: "*" with no authentication - any origin can make requests

Related rules: ic-001 through ic-003

access-control

ID prefix: ac- Severity range: High–Medium Rules: 3

Access control rules detect missing authentication checks on tool endpoints, unauthenticated administrative routes, and hardcoded bypass conditions that allow unauthorized callers to invoke privileged operations.

What it detects:

Tool handlers that process requests without verifying caller identity or checking an authorization token
Admin routes with no access guard - any caller can invoke privileged operations
Hardcoded bypass conditions that create permanent backdoors in tool handlers

Example finding:

HIGH  ac-001  Unauthenticated tool handler
      src/tools/admin.ts:8
      Evidence: Tool handler processes all requests without auth check - no token validation found

Related rules: ac-001 through ac-003

third-party-content

ID prefix: tpc- Severity range: High Rules: 6

Third-party content ingestion rules detect skills that read untrusted external content into the agent’s context. Any human-written content the agent processes — email, chat messages, social media posts, web pages, GitHub issues, RSS feeds, registry-installed code — could contain adversarial instructions that hijack the agent’s behavior (indirect prompt injection).

What it detects:

Skills that read email bodies, chat messages, or social media posts into the agent’s processing context
Tools that fetch and process arbitrary web content or RSS feeds for the agent to act on
Registry-installed code (npm, pip, go install) that the agent executes without verification

Example finding:

HIGH  tpc-001  Third-party content ingestion
      src/tools/email-reader.ts:23
      Evidence: Skill reads email body content into agent context - indirect prompt injection risk

Related rules: tpc-001 through tpc-006

credential-extraction

ID prefix: cex- Severity range: Critical–High Rules: 3

Credential extraction rules detect skills that extract credentials from other applications’ storage — browser cookies, password managers, OS keychains. This is distinct from standard API key configuration (environment variables, config files), which is expected behavior for SaaS-integrated tools.

What it detects:

Skills that access browser cookie stores (Chrome, Firefox, Safari) to extract session tokens
Tools that read from OS credential managers (macOS Keychain, Windows Credential Manager)
Code that extracts saved passwords from password manager databases or browser autofill stores

Example finding:

CRITICAL  cex-001  Browser credential extraction
          src/tools/session-sync.ts:45
          Evidence: Reads Chrome Login Data SQLite database - extracts saved passwords

Related rules: cex-001 through cex-003

permission-bypass

ID prefix: pbp- Severity range: Critical–High Rules: 3

Permission and safety bypass rules detect skills that explicitly disable or circumvent safety controls, permission systems, or sandbox boundaries. These patterns indicate either malicious intent or dangerously careless configuration that exposes the agent to unrestricted operation.

What it detects:

Flags and options that bypass permission prompts or safety checks (e.g., --yolo, bypassPermissions)
Code that requests elevated privileges beyond the tool’s stated purpose (Full Disk Access, root)
Sandbox escape patterns that break out of containerized or restricted execution environments

Example finding:

CRITICAL  pbp-001  Safety control bypass
          src/tools/fast-deploy.ts:12
          Evidence: --yolo flag disables all permission checks before executing file operations

Related rules: pbp-001 through pbp-003

unsupervised-execution

ID prefix: uex- Severity range: High–Medium Rules: 3

Unsupervised agent execution rules detect skills that spawn background agents, autonomous workers, or persistent daemons with reduced human oversight. Multi-agent orchestration patterns amplify blast radius because a single compromised agent can direct others.

What it detects:

Skills that launch autonomous sub-agents with code execution capability in background processes
Multi-agent orchestration patterns where agents spawn and direct other agents without human checkpoints
Persistent daemon processes that continue operating after the parent agent session ends

Example finding:

HIGH  uex-001  Unsupervised sub-agent spawning
      src/tools/orchestrator.ts:89
      Evidence: Spawns background agent with code execution - no human approval checkpoint

Related rules: uex-001 through uex-003

privilege-escalation

ID prefix: privesc- / pe- Severity range: Critical–High Rules: 18

Privilege escalation rules detect agent code that attempts to gain elevated system access - executing commands as root, escaping container boundaries, loading kernel modules, or manipulating OS scheduling to establish persistence. AI agents operate on developer machines with broad tool access. A privilege escalation attempt inside an agent context has a much higher success probability than in a traditional attack, because the agent already runs in the user’s security context and has file system access.

What it detects:

Subprocess calls that execute sudo, su, or doas to escalate to root privileges
Docker run commands with --privileged flag or --cap-add SYS_ADMIN that erase container isolation
IAM policy documents with "Action": "*" granting full cloud account access
Kernel module loading (insmod, modprobe) and crontab manipulation for persistence
Container escape patterns targeting host filesystem mounts and namespace breakouts

Example finding:

CRITICAL  privesc-001  Root privilege escalation
          src/tools/deploy.ts:34
          Evidence: subprocess call executes sudo - AI agent should never need root privileges

Example finding:

CRITICAL  pe-013  Privileged container execution
          docker-compose.yml:18
          Evidence: Container runs with --privileged flag - unrestricted access to host kernel

Related rules: privesc-001 through pe-018

cross-agent-propagation

ID prefix: mat- Severity range: Critical–High Rules: 3

Cross-agent propagation rules detect configurations where one agent can modify another agent’s state, memory, or configuration without mutual authentication. In multi-agent systems, agents share workspaces, relay tool calls, and broadcast messages between sessions. Without identity verification, a compromised agent can poison every other agent it communicates with.

What it detects:

Multi-agent configs where agents write to shared directories without verifying the writer’s identity
Authentication explicitly set to none, false, or disabled in agent-to-agent communication configs
Broadcast message patterns where one agent can inject instructions into other agents’ context

Example finding:

HIGH  mat-001  Unauthenticated shared workspace
      crew.yaml:42
      Evidence: Agents write to shared directory without identity verification - one compromised agent can modify state consumed by all others

Related rules: mat-001 through mat-003

What to do next

Security Model → - what Firmis detects, what it doesn’t, and why
Built-in Rules → - full listing of all 324 detection rules with IDs and descriptions
Custom Rules → - write your own detection rules in the same YAML schema
Detection Engine → - how rules are scored and thresholds applied
firmis scan → - CLI reference