The 24-hour build is the easy part. The hard part is knowing which scanners to run on every PR (one), which to schedule weekly (three), and which directories to exclude before the false-positive count hits four digits. Here’s the prompt that gets it right the first time — and the cost discipline that keeps your monthly GitHub Actions bill at zero.
Companion to “Six Layers of Security Scanning, Built in 24 Hours”. That piece tells the story; this one is the prompt and the operational discipline behind it. See also: “Your Security Scanner Can’t See Your CORS Config” — the targeted OWASP review pass that runs alongside the always-on scanners described here.
Why “Just Turn on Security Scanning” Is the Wrong Mental Model
There are two ways teams approach security scanning. The first is the wrong one: install a tool, point it at the codebase, look at the report, declare victory. The second is the right one: install several overlapping tools, accept that each one finds a different class of problem, tune each one for your codebase’s noise profile, schedule the expensive ones to run weekly instead of on every push, and route the output of all of them into a single dashboard so the security review is one tab, not seven.
The difference between the two approaches is not technical. Both are well-documented; both can be set up in an afternoon by someone competent. The difference is operational discipline — and operational discipline is the part that’s invisible until it bites.
A reader of the original 24-hour-build piece would conclude that the technical work is now a commodity. That’s correct. What that piece doesn’t quite spell out is the meta-work: which scanners overlap on purpose, what to do with the 8,000+ first-pass findings, how to keep your GitHub Actions minutes bill under $0/month while running six scanners on a private repo, and how to triage false positives so you don’t train your team to ignore the security tab forever.
The answers to those questions are not in the official docs of any of the tools. They are in the prompt below.
The technical work is now a commodity. The meta-work — cadence, severity thresholds, exclusion lists, false-positive triage — is the part nobody documents and the part that decides whether the team will actually use the security tab a quarter from now.
The Six Layers (and Why the Overlap Is Deliberate)
The setup is six independent scanners writing into one unified GitHub Security tab via the SARIF format. Each layer catches a different class of problem:
Layer 1 — CodeQL (GitHub’s semantic analysis): traces data flow through your code to find injection vulnerabilities, unsafe deserialisation, and the kind of “user input reaches dangerous API” bugs that no regex can find.
Layer 2 — OSV Scanner (Google’s database): checks every npm/composer/pip dependency against Google’s Open Source Vulnerability database. Different sources than the next one.
Layer 3 — Trivy (Aqua Security’s database): checks the same dependencies against a different CVE database. Intentional overlap — the two databases don’t always have the same vulnerabilities, and a dependency miss is a dependency miss.
Layer 4 — Semgrep (pattern-based): runs hundreds of OWASP-derived rules against the codebase as text patterns. Finds things like missing CSRF tokens, unsafe DOM sinks, weak crypto.
Layer 5 — Dependabot (GitHub-native, proactive): opens PRs for vulnerable npm packages, vulnerable GitHub Actions, and vulnerable Docker base images as soon as advisories drop. The only one of the six that PROPOSES fixes rather than just reporting findings.
Layer 6 — GitHub Secret Scanning (repo-settings toggle): catches keys committed in git history, including historical ones. Not a workflow file — a settings checkbox.
The reason “overlap is intentional” is the headline lesson: a single scanner gives you false confidence in proportion to how thoroughly you trust its database. Different scanners using different databases catch different things, and the cost of the overlap is one dashboard with a few duplicate rows, not multiple dashboards. That’s a price worth paying.
The One Prompt That Sets It All Up
Copy this into Claude Code (or Codex) at the root of any repo:
Set up a six-layer security scanning stack for this codebase. All
findings should land in the GitHub Security tab via SARIF upload, on a
schedule that respects the GitHub Actions minutes free-tier budget.
═══ PHASE 0 — DISCOVER FIRST ═══
Before writing any workflow files, inspect the codebase and write your
findings to `docs/security-scanning.md`. Cover:
- The languages / ecosystems present (we used TypeScript/JavaScript
+ Next.js + GitHub Actions YAML + a C# sidecar; yours might be
Python, Go, Ruby, mixed monorepo, etc.)
- The dependency manifests in the repo (`package.json`, `Pipfile`,
`go.mod`, `pom.xml`, `Cargo.toml`, `composer.json`, etc.) — these
drive what OSV and Trivy will scan
- Any Dockerfiles, base images, and container build pipelines that
Dependabot + Trivy will need to know about
- Vendored or generated directories that should be excluded from
scanning (we had `node_modules`, plus seven project-specific dirs
like `html-static`, `claudecodeui-origin`, `knowledge-source`).
Identifying these UP FRONT is the most important false-positive
fight — see GOTCHA 2 below.
- Whether the repo is public or private (affects GitHub Actions
free-tier minute budget and whether GitHub Advanced Security is
bundled or paid)
- Existing CI workflows in `.github/workflows/` (so the new security
workflows don't collide with them on naming or schedule)
Write findings + the implementation plan to `docs/security-scanning.md`.
Stop. I'll confirm before you implement.
═══ PHASE 1 — IMPLEMENT (after spec approval) ═══
1. CODEQL — `.github/workflows/codeql.yml`. Matrix over languages
discovered in Phase 0. Trigger: push to main + pull_request to main
+ weekly schedule (cron — pick an off-hours slot) + workflow_dispatch.
IMPORTANT: include `language: actions` in the matrix even if your
primary stack isn't Actions — CodeQL scans your workflow YAML files
themselves for command-injection and untrusted-input flow. Most
teams don't know this language exists. See GOTCHA 3.
Use `permissions: {}` at job level + per-step write permissions —
never blanket workflow permissions.
2. OSV SCANNER — `.github/workflows/osv.yml`. Use Google's reusable
workflow `google/osv-scanner-action/.github/workflows/osv-scanner-reusable.yml@v2.x`.
Trigger: weekly schedule + workflow_dispatch ONLY. Not on every
PR. See GOTCHA 1 (cadence + costs).
`scan-args` should include `--recursive` and `--skip-dirs` for
each vendored directory identified in Phase 0.
3. TRIVY — `.github/workflows/container-scan.yml`. Use
`aquasecurity/trivy-action@0.x`. Filesystem scan (`scan-type: 'fs'`),
SARIF output. Trigger: weekly + workflow_dispatch.
CRITICAL: set `severity: 'HIGH,CRITICAL'` — without this, Trivy
reports every MEDIUM finding in your dependency tree and the
security tab becomes unusable. See GOTCHA 4.
4. SEMGREP — `.github/workflows/semgrep.yml`. Run in the `semgrep/semgrep`
container, use `--config auto` (loads 500+ OWASP rules), SARIF output.
Trigger: weekly + workflow_dispatch.
ADD `if: github.actor != 'dependabot[bot]'` at the job level —
stops Semgrep from running on every Dependabot PR (which just bumps
a version and doesn't change code patterns).
`--exclude=` every vendored directory from Phase 0.
5. DEPENDABOT — `.github/dependabot.yml`. Weekly schedule for `npm`,
`github-actions`, and `docker` ecosystems at minimum. Add others
per the manifests Phase 0 found. Open one PR per vulnerable
package, not a batch — easier to review and roll back.
6. SECRET SCANNING — NOT a workflow file. Enable via repo settings
(Settings → Security → Code security → Secret scanning + Push
protection). Document this in `docs/security-scanning.md` as a
manual setup step. Push protection blocks committing recognised
secrets before they reach the remote.
7. SARIF UNIFICATION — every workflow (1-4) ends with
`github/codeql-action/upload-sarif@v4` so all findings land in
the same GitHub Security tab. The CodeQL action handles SARIF
upload natively; Trivy and Semgrep output SARIF that the
upload-sarif action then ingests. This is the contract that
makes "one dashboard, multiple scanners" work.
═══ PHASE 2 — VERIFY before shipping ═══
- All four scheduled workflows run cleanly via workflow_dispatch
BEFORE merging — fix any setup errors with the secrets tab still
empty, not after the team learns to ignore failing badges.
- The Security tab shows findings categorised by scanner (CodeQL,
Trivy, Semgrep, OSV separately).
- The false-positive count on the first run is single-thousands or
below. If you have 8,000+ findings on first scan, the vendored-
directory exclusions in Phase 0 missed a dir. Fix the exclusions
BEFORE triaging individual findings.
- The next billing cycle's Actions-minutes usage is under 20% of
the free tier (~400 minutes/month for private repos). If higher,
a scheduled scan is misconfigured to PR-trigger.
Ship as 4 small PRs (one per workflow), not one mega-PR. Each PR
independently revertable.
Adapt the specifics — swap the scanners for your CI platform’s equivalents if you’re not on GitHub Actions (GitLab has its own native scanners; Bitbucket runs SARIF through Atlassian’s Code Insights; CircleCI has Snyk integration). The phase structure works the same way.
Notice What the Prompt Is Doing
- Discovery first. Phase 0 forces the agent to enumerate the actual languages, manifests, container files, and — critically — the vendored directories before writing a single workflow. Most teams’ first scan returns 5,000+ findings because they scanned
node_modulesor a vendored fork. The Phase 0 exclusion list is the single most valuable artefact in this whole setup. - Cadence is part of the spec, not an afterthought. Every numbered section names its trigger condition (per-PR vs weekly cron). The cost discipline lives in the spec, not in a comment-after-the-fact. See GOTCHA 1.
- Severity filters are mandatory, not defaults. Trivy’s default severity is “everything” which produces noise that buries the actionable findings. The prompt enforces
HIGH,CRITICALonly — overrideable later, but you start tight and loosen, never the other way. - SARIF as the unifying contract. Section 7 calls out the upload-sarif action explicitly. Without it you’d have four scanners reporting into four places. With it, one dashboard, one review surface, one place to suppress false positives.
- “Don’t run on Dependabot PRs” named inline. Small detail, large saving — Semgrep running on every Dependabot PR (10–30+ per week on an active repo) burns minutes for zero new findings. Most teams hit this only after the bill arrives.
- The verification phase audits the bill. Phase 2 includes “next billing cycle’s minutes usage is under 20% of free tier.” That’s an actionable check, not a vibe — and it closes the loop on the cost discipline the prompt was written to enforce.
What Actually Goes Wrong (Real Gotchas From the Setup)
The prompt above didn’t come out of theory. Each numbered gotcha below cost an iteration or a re-tune to discover.
GitHub Actions minutes are the real budget. Schedule expensive scans weekly, not per-PR.
Trivy, Semgrep, and OSV each take 2–10 minutes per run. CodeQL takes 5–15. If you trigger all four on every push and PR — which is the default in most “getting started” docs — and you have 15–25 PRs per week, you’re looking at roughly 1,500–2,000 minutes per month from security scanning alone. Private repos get 2,000 free Actions minutes per month before billing kicks in at $0.008/minute. You’ll burn the entire free tier on scans that didn’t find anything new.
The actual right cadence: CodeQL on every push + PR, because it analyses code that changed; OSV, Trivy, Semgrep on weekly cron, because they scan databases of known vulnerabilities. New CVEs appear days or weeks after a release, not seconds.
The original setup ran on a Wednesday-03:00 staggered cron. That’s ~80 minutes/month of security scanning — under 5% of the free tier. Lesson: the cost model is the part of security scanning that isn’t documented anywhere. Pick the cadence first, then write the workflow.
Vendored directories are most of the false-positive fight.
The original baseline scan found 8,277 Semgrep findings. 7,381 of them were noise — almost all from documentation HTML files (missing Subresource Integrity attributes on CDN script tags in static examples), vendored third-party code, or generated fixture data. The actionable findings were under 900 — still a lot, but a triagable lot, not a “the team will scroll past this forever” lot.
The fix wasn’t smarter triage. The fix was naming the directories that shouldn’t be scanned in the first place: node_modules, generated docs, vendored forks, fixture data, integration sample apps. Each scanner has a different exclusion syntax (--skip-dirs for Trivy and OSV, --exclude= for Semgrep, paths-ignore in the CodeQL config).
Lesson: the right first scan finds hundreds of items, not thousands. If your first scan finds five-digit findings, you haven’t tuned your exclusions — and you’ll never recover the team’s attention because they’ll learn to ignore the security tab. Tune the exclusions before you triage the findings.
CodeQL has a language: actions mode that scans your workflows.
GitHub Actions workflow YAML files are a notorious source of command-injection vulnerabilities — anywhere you interpolate ${{ github.event.* }} or ${{ github.head_ref }} directly into a run: command, you may have a workflow-injection vulnerability that lets a contributor’s PR title or branch name execute arbitrary shell on your CI runner. Real exploits have happened. The class of bug is documented but not famously known.
CodeQL has a dedicated language: actions mode that scans for exactly this. Add it to the matrix alongside your application languages. It’s free (CodeQL runs on the same workflow), takes under a minute on most repos, and catches a class of bug that no other scanner in this stack finds.
Lesson: the most important scanner is sometimes scanning the thing you didn’t think of as code. Workflow YAML is code. Treat it as such.
Trivy without severity: 'HIGH,CRITICAL' will drown you.
Trivy’s default behaviour is to report every CVE it finds across all severities. On a normal-sized Next.js codebase that’s hundreds of MEDIUM findings about transitive dependencies you don’t actually use the vulnerable code path of, plus LOW findings on every web framework’s lifetime exposure to denial-of-service classes.
Setting severity: 'HIGH,CRITICAL' on the action immediately collapses this to the dozen or two findings that actually matter — and “actually matter” means “if this is exploitable in your app’s specific code path, you have a real problem.” MEDIUM and LOW are signal for a dedicated security team that has time to evaluate exploit reachability; for a small team they’re noise that gets ignored.
Lesson: you can always lower the threshold later. Start at HIGH+CRITICAL, build the triage muscle on a small finding set, then expand. Most teams start at the default and never expand because the default already broke their attention budget.
The right first scan finds hundreds of items, not thousands. If your first scan finds five-digit findings, you haven’t tuned your exclusions — and you’ll never recover the team’s attention.
What This Costs
About four hours of an engineer’s attention spread across an afternoon. The agent generates the four workflow YAMLs + the Dependabot config; the human picks the cron times, picks the severity threshold, enumerates the vendored directories from a quick walk through the repo, and toggles the two repo-settings checkboxes (secret scanning + push protection) that aren’t code.
The output is a permanent zero-marginal-cost security baseline. Roughly 80 minutes of GitHub Actions runtime per month against a 2,000-minute free tier. The first run produces a triagable finding set — typically a few hundred items the first time, dropping to a handful per week thereafter as new CVEs land in the databases. If you’ve never done this on a codebase that’s more than a year old, expect the first triage pass to take a day; subsequent weeks take 15–30 minutes.
The Broader Point
The original 24-hour-build piece argued that security scanning is no longer a specialised discipline requiring dedicated headcount. That’s true at the implementation layer. What this companion piece adds is that there’s a discipline layer above implementation — cadence, severity thresholds, exclusion lists, false-positive triage strategy — that’s just as important and just as poorly documented.
That discipline layer is the part an AI agent doesn’t get right on its first instinct. It will install all six scanners, leave Trivy at default severity, run everything on every PR, and produce a beautiful first-scan dashboard with 8,000 findings that nobody will ever look at twice. That outcome looks correct to the agent because it followed the official getting-started docs of every tool.
What gets it right is a prompt that names the operational defaults — cadence, severity, exclusions, SARIF unification — as first-class requirements alongside the tool installations themselves. That’s what the prompt above does. The work it produces in an afternoon is not “six scanners installed.” It’s “six scanners installed with the cost model figured out and the noise pre-tuned.” That’s the difference between a security stack the team will actually use and one that gets dismissed within a quarter.
Built on a Next.js + .NET sidecar codebase deployed on GitHub Actions, with all six layers running against the same repo for ~$0/month in Actions minutes. The work was done with Claude Code (Opus 4.7) in a single afternoon — four workflow PRs, one Dependabot config, two repo-settings checkboxes — with about three hours of human review and decision time on top. The 24-hour timeline from the companion piece is real, including the false-positive triage that turned 8,277 first-pass findings into a triagable set.