Skip to content

Detection Engine

Traditional security scanners look for known CVEs and malware hashes. Agent threats are different — they hide in natural language, YAML configs, and tool metadata. A malicious tool description is valid JSON. A prompt injection is a plain text string. A credential path reference is just a string literal. None of these trigger conventional scanners.

Firmis uses a YARA-inspired pattern engine designed specifically for this. 209 rules. 7 matcher types. Confidence scoring that suppresses noise without missing real threats.

Each rule in a YAML file has this structure:

rules/tool-poisoning.yaml (excerpt)
rules:
- id: tp-001
name: Hidden Instructions in Tool Descriptions
description: "Detects invisible Unicode characters used to hide instructions"
category: tool-poisoning
severity: critical
version: "1.0.0"
enabled: true
confidenceThreshold: 50
patterns:
- type: regex
pattern: '[\u200B\u200C\u200D\uFEFF]'
weight: 95
description: "Zero-width space or BOM character"
- type: regex
pattern: '<!--[\s\S]*?-->'
weight: 90
description: "HTML comment block hiding instructions"
remediation: |
Remove all invisible Unicode characters and HTML comments from tool descriptions.

Key fields:

FieldTypeDescription
idstringUnique rule identifier (e.g., tp-001, sec-045)
categorystringOne of 16 threat categories
severityenumcritical, high, medium, low
confidenceThresholdnumber (0–100)Minimum confidence required to emit a finding
patternsarrayOne or more pattern objects, each with type, pattern, and weight
enabledbooleanSet to false to disable a rule globally

Why this structure matters: Rules are not boolean — they are weighted. A rule with a threshold of 70 will not fire on a single weak signal. Multiple signals need to co-occur, or a single high-weight signal must be present. This is what keeps false positive rates low on real codebases.


Each pattern in a rule specifies a type that determines how the pattern is evaluated against a file. Different threat categories need different matching strategies — a credential path is best matched as a file path, a package name as a string literal, and a suspicious URL pattern as a network pattern.

Applies a JavaScript regular expression to the raw file content. The most common matcher type. Used for secrets, prompt injection phrases, and malware signatures where the exact shape of the dangerous content is known.

Example — regex
- type: regex
pattern: 'AKIA[0-9A-Z]{16}'
weight: 100
description: AWS Access Key ID

Applies a YARA-syntax string match (not the YARA binary). Supports hex strings, wide strings, and simple string conditions. Used for malware signature matching where the YARA ecosystem’s pattern vocabulary is well-established.

Example — yara
- type: yara
pattern: '"bitcoin" nocase'
weight: 70
description: Reference to bitcoin wallet operations

Matches when the file content contains a reference to a specific file path — typically a sensitive credential file or system path. Path expansion (e.g., ~ → home directory) is applied before matching. This is the primary matcher type for credential harvesting and file system abuse rules.

Example — file-access
- type: file-access
pattern: "~/.aws/credentials"
weight: 90
description: Direct reference to AWS credentials file

Matches when a specific module or package import appears in the file. Handles Python import/from and JavaScript/TypeScript require/import statements. Used for supply chain rules where the presence of a compromised package import is itself the finding.

Example — import
- type: import
pattern: "paramiko"
weight: 60
description: SSH library import — check for unauthorized tunnel creation

Matches URL patterns or hostname patterns in the file content. Used to detect requests to suspicious TLDs, tunneling services, or known malicious domains. The agent threat landscape involves many domains that have no legitimate use in agent code.

Example — network
- type: network
pattern: "https?://[^/]*\\.(tk|ml|ga|cf|gq|xyz)/"
weight: 85
description: Request to suspicious top-level domain

Matches an exact string literal, including surrounding quotes. Used for known-bad package names and exact-match indicators where regex overhead is unnecessary and a partial match would produce false positives.

Example — string-literal
- type: string-literal
pattern: '"event-stream"'
weight: 90
description: event-stream — compromised to steal bitcoin wallets

Plain substring search against file content. No regex syntax. Used for simple keyword matches — for example, an exact config flag that disables authentication. Faster than regex for exact string matching.

Example — text
- type: text
pattern: "DISABLE_AUTH=true"
weight: 80
description: Authentication bypass flag set in config

This is the core of the engine. Each pattern match produces a weight (0–100). After all patterns in a rule are evaluated against a file, Firmis computes a single confidence score:

confidence = Math.max(ratioConfidence, maxSinglePatternWeight)

ratioConfidence is a scaled score reflecting how many of the rule’s patterns matched:

ratioConfidence = (matchedPatterns / totalPatterns) * averageMatchedWeight

maxSinglePatternWeight is the weight of the highest-weighted individual pattern that matched.

Taking the Math.max of the two ensures that a single very strong indicator (e.g., an exact AWS key pattern at weight 100) always produces a high confidence score even if other patterns in the same rule did not match.

Most security scanners are binary: either a pattern matches or it doesn’t. That produces high false positive rates on real codebases because individual signals are often ambiguous. The confidence model lets Firmis express: “this pattern alone is suspicious but not conclusive — but this other pattern combined with it crosses the threshold.”

Each rule defines a confidenceThreshold. A finding is only emitted when:

confidence >= confidenceThreshold

Rules with multiple low-weight patterns and a high threshold require several patterns to co-occur before a finding is emitted. This reduces false positives for ambiguous indicators.

Example — data exfiltration rule with threshold 70:

Signals presentConfidenceResult
fetch() call alone (weight 35)35Suppressed — normal code uses fetch
fetch() + process.env access (weight 40)~37Suppressed — still plausibly legitimate
Suspicious domain match (weight 85)85Finding emitted — this is the strong signal
fetch() + process.env + suspicious domain85Finding emitted

Files with .md and .txt extensions receive a confidence multiplier before threshold comparison:

adjustedConfidence = confidence * 0.15

This reduces noise from rule matches in documentation files, README files, and changelog entries that mention threat patterns in a non-executable context. A pattern that would produce confidence 80 in a .ts file produces confidence 12 in a .md file — well below most thresholds.

Why this matters: Documentation often explains attack techniques. A README that documents how prompt injection works should not fire a prompt injection rule. The 0.15x multiplier handles this without requiring every documentation reference to be suppressed manually.

Exception: The secret-detection category is exempt from this multiplier. A hardcoded secret in .env.example or a README.md is still a real finding because it may be committed to source control and discovered by attackers — even if it was intended as an example.


When running a path-based scan (npx firmis scan .), the same file can be indexed by multiple platforms. For example, a shared src/tools/ directory may be picked up by both the claude and mcp analyzers.

Firmis deduplicates findings with the same (ruleId, file, line) triple. The first occurrence is kept; subsequent duplicates are dropped. Without this, a project with 5 detected platforms could report each finding 5 times — inflating counts and making real threats harder to spot.


Rules are evaluated in the order they appear in their YAML file. Within a file, all rules are applied to each file independently. There is no short-circuit evaluation across rules — all enabled rules are always evaluated.

To skip specific rules, use:

Terminal
# Via CLI flag
npx firmis scan --ignore tp-001,sec-045
# Via .firmisignore
rule:tp-001
rule:sec-045

  • How It Works — the three-stage pipeline and what happens at each step
  • Threat Model — all 16 threat categories with real attack examples
  • Built-in Rules — full rule listing with IDs, weights, and descriptions
  • Ignoring Findings — how to suppress false positives without weakening your scan coverage