[Security hardening] Add automated security audit workflow#2442
[Security hardening] Add automated security audit workflow#2442PascalThuet wants to merge 22 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new GitHub Actions workflow to introduce automated security checks for the Python codebase, aiming to catch dependency advisories and high-severity static-analysis findings in CI alongside the existing test/lint workflows.
Changes:
- Add
.github/workflows/security.ymlwith a dedicatedSecurity Auditworkflow. - Run
pip-auditon pushes tomain, pull requests, a weekly cron, and manual dispatch. - Run Bandit against
src/, temporarily skippingB602pending the shell-step hardening tracked in #2440.
Show a summary per file
| File | Description |
|---|---|
.github/workflows/security.yml |
New CI workflow that adds dependency-audit and static-analysis jobs for the Python project. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 1/1 changed files
- Comments generated: 5
mnriem
left a comment
There was a problem hiding this comment.
Please address Copilot feedback. If not applicable, please explain why
|
Addressed the Copilot feedback in the latest push:
Local validation: uv export --quiet --extra test --frozen --format requirements.txt --no-emit-project --output-file /tmp/spec-kit-audit-requirements.txt
uvx --from pip-audit==2.10.0 pip-audit -r /tmp/spec-kit-audit-requirements.txt --progress-spinner off
uvx --from bandit==1.9.4 bandit -r src -lll
git diff --checkResults: |
|
Added automated regression coverage for the security workflow in The new tests statically verify that:
Validation after this commit: uv run python -m pytest tests/test_security_workflow.py -q
uv export --quiet --extra test --frozen --format requirements.txt --no-emit-project --output-file /tmp/spec-kit-audit-requirements.txt
uvx --from pip-audit==2.10.0 pip-audit -r /tmp/spec-kit-audit-requirements.txt --progress-spinner off
uvx --from bandit==1.9.4 bandit -r src -lll
git diff --checkAll passed locally. |
There was a problem hiding this comment.
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
src/specify_cli/init.py:414
- This second
# nosec B602suppresses Bandit for the non-capturing branch as well, so neither path throughrun_commandwill ever raise B602 again. If the intent is only to defer the current finding until #2440, the skip needs to stay in the workflow/configuration layer rather than permanently muting this call site in source.
# shell=True is only available to callers that opt in explicitly.
subprocess.run(cmd, check=check_return, shell=shell) # nosec B602
- Files reviewed: 5/5 changed files
- Comments generated: 5
|
Please address Copilot feedback. If not applicable, please explain why. Note the shell step should indeed be ignored |
|
Addressed the follow-up review in
Validation: uv export --quiet --extra test --format requirements.txt --no-emit-project --output-file /tmp/spec-kit-audit-requirements.txt
uvx --from pip-audit==2.10.0 pip-audit -r /tmp/spec-kit-audit-requirements.txt --progress-spinner off
uvx --from bandit==1.9.4 bandit -r src -lll --baseline .github/bandit-baseline.json
uv run python -m pytest tests/test_security_workflow.py tests/test_workflows.py -q
uvx ruff check src/
git diff --checkI also verified the |
|
One more small cleanup after re-review: pushed Reason: it still audits the runtime + Revalidated locally: uv pip compile pyproject.toml --extra test --quiet --output-file /tmp/spec-kit-audit-requirements.txt
uvx --from pip-audit==2.10.0 pip-audit -r /tmp/spec-kit-audit-requirements.txt --progress-spinner off
uvx --from bandit==1.9.4 bandit -r src -lll --baseline .github/bandit-baseline.json
uv run python -m pytest tests/test_security_workflow.py -q
git diff --checkAll passed. |
There was a problem hiding this comment.
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
tests/test_security_workflow.py:91
- Bandit doesn't require the exact literal
# nosec B602to suppress this finding; plain# nosec,#nosec, and multi-ID forms also disable B602. This check leaves those suppression paths untested, so a future source-level bypass can slip in without failing CI.
)
def test_b602_is_not_suppressed_in_source(self):
source_text = "\n".join(
path.read_text(encoding="utf-8")
for path in (REPO_ROOT / "src").rglob("*.py")
)
- Files reviewed: 5/5 changed files
- Comments generated: 3
|
Please address Copilot feedback |
|
Addressed the latest Copilot review in
Validation: uv run python -m pytest tests/test_security_workflow.py -q
uv pip compile pyproject.toml --extra test --python-version 3.11 --generate-hashes --quiet --output-file /tmp/spec-kit-audit-py311.txt
uv pip compile pyproject.toml --extra test --python-version 3.12 --generate-hashes --quiet --output-file /tmp/spec-kit-audit-py312.txt
uv pip compile pyproject.toml --extra test --python-version 3.13 --generate-hashes --quiet --output-file /tmp/spec-kit-audit-py313.txt
uvx --from pip-audit==2.10.0 pip-audit --disable-pip --require-hashes -r /tmp/spec-kit-audit-py311.txt --progress-spinner off
uvx --from pip-audit==2.10.0 pip-audit --disable-pip --require-hashes -r /tmp/spec-kit-audit-py312.txt --progress-spinner off
uvx --from pip-audit==2.10.0 pip-audit --disable-pip --require-hashes -r /tmp/spec-kit-audit-py313.txt --progress-spinner off
uvx --from bandit==1.9.4 bandit -r src -lll --baseline .github/bandit-baseline.json
git diff --checkAll passed locally. |
mnriem
left a comment
There was a problem hiding this comment.
Please address workflow errors
Resolve conflicts in src/specify_cli/__init__.py and src/specify_cli/integrations/catalog.py: - __init__.py: drop local definitions of run_command, check_tool, is_git_repo, init_git_repo, handle_vscode_settings, merge_json_files, _locate_core_pack, _repo_root, _locate_bundled_extension, _locate_bundled_workflow, _locate_bundled_preset. These now live in the new _utils.py / _assets.py modules introduced on main and are re-imported at the top of __init__.py. Keep the security-hardening read_response_limited import. - _utils.py: apply the security hardening from this branch to the relocated run_command — drop the shell= parameter so subprocess is never invoked via /bin/sh. - integrations/catalog.py: keep both imports (read_response_limited from this branch + CatalogEntry/CatalogStackBase from main).
Six follow-on checks that lock in the hardening from this PR and add the surfaces it didn't cover: 1. ruff S602/S604/S605 in pyproject.toml — fail PRs that reintroduce subprocess shell=True. The intentional shell=True in the workflows shell step keeps its NOTE comment and gets an explicit `# noqa: S602` so the deviation is visible. 2. Bandit two-pass in security.yml — keep `-lll --baseline` blocking and add a non-blocking `-ll` informational pass so MEDIUM findings show in the job summary instead of accumulating silently. 3. Bandit baseline diff check — fail PRs that grow .github/bandit-baseline.json unless they carry the `security-baseline-change` label. New script in .github/scripts/check_bandit_baseline.py. 4. Secret scanning via detect-secrets — new `secret-scan` job in security.yml with a committed .secrets.baseline that whitelists the nine current findings (all SHA pins / docs examples / test fixtures; audited before commit). Drift fails the check. 5. shellcheck on scripts/bash/*.sh in lint.yml. Starts at --severity=error to catch real bugs; style (SC2155) can be tightened in a follow-up. 6. macos-latest added to the dependency-audit matrix in security.yml — aligns with test.yml's posture and catches platform-specific resolver surprises.
|
Hi @mnriem — quick status update and a small ask. Conflicts resolved ( Follow-on PR checks (
Tests: 2968 pass locally on the merged tree. Ask: the four workflows ( |
mnriem
left a comment
There was a problem hiding this comment.
Please address Copilot feedback. If not applicable, please explain why. And also fix test and lint errors
Copilot review #4291726625 + mnriem CHANGES_REQUESTED #4292842064. Hardening (Copilot suggestions on existing code): - Pass strict_redirects=True to open_url() at the three catalog/workflow download call sites (__init__.py preset/workflow downloads, integrations/catalog.py). Closes an HTTPS->HTTP downgrade window where the bounded read could happen on a redirected http:// target before the post-redirect URL validation. Lock-in fixes for the new PR checks: - check_bandit_baseline.py: compare result identities (filename + line + test_id + severity + confidence + code-hash) instead of raw counts so a PR can't silently swap one whitelisted finding for another. Also treat "baseline file absent at base ref" as introduction (no label required) instead of growth-from-zero. - Switch secret-scan to detect-secrets-hook (instead of `scan --baseline` followed by `git diff --exit-code`). The scan command rewrites the baseline's generated_at timestamp on every run, so the diff guard always tripped. detect-secrets-hook only reports findings that aren't in the baseline, so the diff guard is unnecessary. Brittle-test fixes: - security.yml: revert macos-latest from the dependency-audit matrix (test_security_workflow.py:160 pins ubuntu+windows, matching test.yml). - security.yml: rename "Run Bandit (HIGH, baseline-gated)" back to "Run Bandit" (test_security_workflow.py:188 expects the canonical name); the medium-severity informational pass keeps a distinct name. - security-audit-requirements.txt: regenerate with uv pip compile — pyproject.toml changed on this branch (ruff config in the previous commit) and upstream package releases drifted the lock; check_security _requirements.py was rightly failing until both sides matched. Pre-existing pinning gap caught by tests/test_github_workflows.py: - Pin actions/github-script@v9 to its commit SHA in catalog-assign.yml. - Fix USES_RE in test_github_workflows.py so it matches the `- uses:` shorthand form (without it, catalog-assign.yml's `@v9` slipped past). Test mocks: - test_integration_catalog.py: extend the three url-mocking helpers to also stub OpenerDirector.open. open_url(strict_redirects=True) takes a different code path that bypasses the urlopen mock; patching the opener covers both paths.
Polishes from the previous review pass that I noticed after pushing. - security.yml: drop the unneeded fetch-depth: 0 from the secret-scan checkout. detect-secrets-hook reads the working tree only — fetching full history slows the job without adding signal. - security.yml: add a follow-on step that surfaces the Bandit medium- severity informational pass in $GITHUB_STEP_SUMMARY. With continue-on-error: true the previous step never marks the job yellow/red, so findings were buried in the log; the summary now flags them with a⚠️ heading (or ✅ when clean) at the top of the run page. - CONTRIBUTING.md: document the new tooling and gates so contributors don't bounce off CI: - detect-secrets-hook command + how to regenerate .secrets.baseline - the bandit baseline label gate (security-baseline-change) - shellcheck --severity=error invocation - explicit note that committed security-audit-requirements.txt can drift purely from upstream package releases and needs periodic regeneration even on unrelated PRs.
Four hardening / robustness items raised during self-review of the PR. - check_bandit_baseline.py: normalize whitespace in the code-snippet hash that's part of each entry's identity. A bandit version bump that reformats the snippet (different number of context lines, different indentation) would otherwise make every baseline entry look "new", forcing the security-baseline-change label on every unrelated PR. - security.yml + check_secrets_baseline.py: symmetric growth gate on .secrets.baseline. detect-secrets-hook already blocks unknown secrets, but extending the baseline (whitelisting a new finding) was silent. Mirror the bandit gate — PR must carry secrets-baseline-change to acknowledge any new identity (filename + line + type + hashed_secret). - test_security_workflow.py: drop the brittle exact-name lookup for the blocking bandit step. The test now finds it by the baseline-arg signature, so future renames of the step don't silently bypass the --skip B602 check. Added _find_step_by_run_signature helper that insists on exactly one match. Strict assertions on OS matrix and tool version pins are kept — those are intentional security choices. - workflows/PUBLISHING.md: the shell-step NOTE in src/specify_cli/workflows/steps/shell/__init__.py points authors here for "security guidance", but the section didn't exist. Added an explicit "Security: shell steps execute arbitrary code" subsection covering the no-sandbox model, the inspect-before-install obligation, input-interpolation hygiene, and reviewer expectations.
Main extracted self/version logic into a new src/specify_cli/_version.py module (d947fda "refactor: extract _version.py from __init__.py (PR-3/8)"). This branch had its own hardening on _fetch_latest_release_tag (using read_response_limited from _download_security). Resolution: - src/specify_cli/__init__.py: drop the HEAD block that defined _get_installed_version, _normalize_tag, _is_newer, _fetch_latest_ release_tag, self_check, self_upgrade — all relocated to _version.py on main and already re-imported at the top of __init__.py. Keep main's single line app.add_typer(_self_app, name="self"). - src/specify_cli/_version.py: apply the read_response_limited hardening to the relocated _fetch_latest_release_tag so the bounded read protection isn't lost in the move. All 2994 tests pass after merge.
Two items from the second self-review: - workflows/PUBLISHING.md: fix invented command name. The first draft recommended `specify workflow inspect <name>` which doesn't exist — the actual subcommand is `workflow info`, and even that only shows metadata (name/version/inputs/step IDs+types), never the shell `run` content. Replace with explicit guidance to read the raw workflow.yml directly when auditing shell steps. - tests/test_baseline_gates.py: new file. 12 unit tests covering both check_bandit_baseline.py and check_secrets_baseline.py — no PR base ref, introduction (baseline absent at base), identical baselines, growth without ack label, growth with ack label, swap attack (constant count, new identity), and (bandit-only) whitespace-only drift in the code snippet hash. The latter verifies the normalization added earlier protects against bandit reformatting its output.
Three polish items from the third self-review pass. - tests/test_upgrade.py: new TestBoundedRead class. Pins the contract that _fetch_latest_release_tag wraps the response body through read_response_limited with max_bytes=1024*1024. Protects the hardening against a silent revert to `resp.read()` in a future refactor (the extraction to _version.py during the last merge would have lost it if we hadn't caught it manually). - tests/test_baseline_gates.py: replace the two near-identical test classes with a parametrized TestSharedBaselineGate (bandit + secrets via a GateConfig dataclass and a `gate` fixture). Bandit-only quirks (no-base-ref short-circuit, whitespace-normalized identity) stay in TestBanditSpecific. Removes ~80 lines of duplication; the two scripts now exercise the same scenarios by construction so a divergence is caught the moment one drifts. - tests/test_baseline_gates.py: new shared scenario test_corrupt_json_at_base_falls_back_to_empty. Covers the except JSONDecodeError branch of _read_baseline_at — corrupt base doesn't crash the script; instead the head set becomes "all new" and the normal label gate fires. Was previously dead code from a coverage standpoint. 3009 passed (up from 3006 — 14 baseline tests now parametrized as 12 + 2 bandit-specific, plus 1 new bounded-read test).
Three micro-cleanups raised during review github#4 of my own work — no behavior change, just clarity. - Replace the __import__("specify_cli._download_security", fromlist=...) dance with a plain `import ... as _real_read_response_limited` at the top of the file. Easier to grep, no runtime difference. - Type the recorded dict explicitly and make max_bytes/label keyword- only without defaults on the spy. If a future refactor drops either argument the spy now raises TypeError immediately, instead of silently recording None and tripping the post-call assertion with a more confusing message. - Tighten the label check from fuzzy substring match ("github" in label.lower()) to exact equality ("GitHub latest release"). Both catch regressions; exact equality also catches typos.
Six items from the new Copilot pass. Three were latent bugs in the guardrails added by earlier commits, two are documentation/wording, one is parity coverage. Bugs - security.yml: the MEDIUM Bandit informational pass ran without --baseline, so the whitelisted HIGH B602 finding re-fired there on every run, turning the job summary into a permanent warning. Apply the same baseline to both passes; medium-only NEW findings now surface, as intended. - security.yml: the summary step ran with if: always() but the MEDIUM pass has the default if: success() — when the blocking HIGH step fails, the MEDIUM pass is skipped (outcome=skipped, not failure) and the summary wrote "✅ clean" anyway. Switch to a case statement that handles failure/success/skipped distinctly (⚠️ / ✅ / ⏭️). - check_bandit_baseline.py and check_secrets_baseline.py used `git show <head_ref>:` on both sides, so an unreadable/unfetched head ref returned empty results and the diff computed 0 new identities → fail-open. Read the head side from the working tree instead (CI is checked out at the PR head), fail-closed when the file is missing, and SystemExit on corrupt JSON. The base side keeps the lenient JSONDecodeError fallback because that's historical state we can't change. Wording - security.yml + CONTRIBUTING.md: both mentioned `# nosec` as a suppression mechanism, but tests/test_security_workflow.py:: test_bandit_nosec_is_not_suppressed_in_source explicitly forbids `# nosec` under src/. Replace with the actually-supported paths (bandit baseline for HIGH findings, `# noqa: S6xx` for ruff subprocess-shell rules) and flag the forbidden-comment policy. Parity coverage - tests/test_security_workflow.py: three new tests for the secret-scan job mirroring the dependency-audit / static-analysis coverage — detect-secrets-hook command, baseline path, excluded paths, growth gate env wiring (BASE only, no HEAD env), and fetch-depth: 0. - tests/test_workflows.py: regression test that WorkflowCatalog._fetch_single_catalog routes through read_response_limited with error_type=WorkflowCatalogError and label "workflow catalog". Mirrors TestBoundedRead for _fetch_latest_ release_tag and the equivalent test in test_integration_catalog.py. - tests/test_baseline_gates.py: two new fail-closed cases (head missing in working tree, head corrupt in working tree); drop the now-unused head_sha returns and the head env var from GateHandle.run. Note: Copilot also flagged "no tests on baseline gate scripts" — those tests already shipped in tests/test_baseline_gates.py (commit 2fd8071, posted before the review). Updated here with the new fail-closed cases. Tests: 3017 passed (was 3009).
Summary
Security context
This creates a repeatable CI baseline for dependency advisories and high-severity static-analysis issues while preserving a scheduled live-resolution signal for newly published dependency problems. It also closes similar supply-chain and archive-handling gaps found during follow-up review.
Closes #2438
Validation