Prompts That Ship · Part 4 View series →
June 8, 2026
11 min

Your Security Scanner Can’t See Your CORS Config. Here’s the One-Prompt OWASP Review That Can.

Whiteboard sketch — a prompt scroll feeds into an AI agent which outputs a cargo ship with numbered crates representing parts of the series

Automated scanners find CVE matches in your dependencies. They don’t find “JWT tokens never expire,” “CORS allows all origins,” or “Multer accepts any file type” — because those aren’t bugs, they’re configuration choices nobody questioned. Here’s the prompt that runs a 12-category OWASP review on any codebase, names what it finds with file:line citations, and converts the findings into a reusable skill so the audit doesn’t have to start from scratch next quarter.

The Class of Bugs Your Scanner Will Never Tell You About

You can run CodeQL, Trivy, OSV, Semgrep, and Snyk against the same codebase and never find that your JWT tokens have no expiration. That’s not a scanner failure — it’s a category mismatch. The absence of { expiresIn: '24h' } as a third argument to jwt.sign() is not a vulnerable code pattern. It’s a missing safety net. Scanners look for patterns; this is a missing pattern. Different problem.

The same gap applies to a long list of real, exploitable, common vulnerabilities:

  • app.use(cors()) with no options allows any origin. Valid code, default behaviour, default security disaster.
  • Multer with no fileFilter accepts any MIME type, including executables. Valid code, default behaviour, real-world exploit class.
  • export { JWT_SECRET, ... } widens the attack surface: every module that imports the auth module now has a path to your signing secret. Valid code, common pattern, scanners shrug.
  • “First user becomes admin” auto-bootstrap on a fresh deploy. if (userCount === 0) user.isAdmin = true; — race-condition exploitable in the first 30 seconds of any deploy. Valid code, common pattern.
  • 19 MCP servers spawning subprocesses with { env: process.env }. Every secret in your application environment is now in 19 child processes you don’t audit. Valid code, common copy-paste from Stack Overflow.

These aren’t bugs in the sense that scanners look for bugs. They’re security postures that need a human (or an AI agent) to walk the codebase and ask “did anyone make a decision here, or is this the default that everyone forgets to change?” Most teams discover the answer the hard way.

Scanners look for patterns; missing safety nets like JWT expiration are missing patterns. Different problem, different audit, same blast radius when they bite.

What “A Real OWASP Review” Actually Means

A targeted OWASP review is not the same exercise as installing a scanner. It is a 1–3 hour walk through twelve named categories, each with a grep pattern that surfaces the codebase’s specific use of that area, each with a “what to flag” list of postures and missing parameters, each ending with a copy-paste fix prompt the team can hand to an AI agent.

The twelve categories that matter for a typical Node/TypeScript/JWT/SQLite/WebSocket stack — adapt as your stack differs — are:

  1. Secrets & keys — hardcoded fallbacks, exported secrets, real credentials in .env* files committed to git
  2. SQL injection — template literals or string concatenation in queries (raw-SQL libraries are the high-risk surface; parameterised ORMs eliminate most of this category)
  3. JWT authentication — missing expiration, hardcoded secrets, exported secrets, no refresh, no revocation list
  4. WebSocket security — token in URL query string, one-time authentication at connect, no per-message validation, no timeout
  5. Command execution — exec()/spawn() with user input, shell: true, template literals in commands
  6. MCP / subprocess security — environment inheritance, file system reach, network access from spawned children
  7. File upload — missing fileFilter, no MIME validation, no extension whitelist, original filename preserved (path traversal)
  8. Frontend token storage — JWTs in localStorage (any XSS = permanent compromise), tokens not cleared on logout
  9. CORS configuration — cors() no options, origin: '*' or true, credentials + wildcard
  10. Rate limiting — none on auth endpoints, none on file upload, none globally
  11. XSS — dangerouslySetInnerHTML, innerHTML assignment with user input, unsanitised markdown rendering
  12. Encryption — static IVs, hardcoded keys, weak key derivation (MD5/SHA1), ECB mode, CBC without HMAC

Each category in your codebase will either pass (no findings), pass-with-tuning (findings but configured defensively), or fail (concrete issue at a specific file:line). The output is a report with severity counts, individual findings, and a fix prompt per finding.

The trick — the part that turns this from a one-shot audit into a permanent capability — is to write the audit prompt itself as a reusable skill on the way out. The next audit starts from the existing skill, not from scratch.

The One Prompt That Does the Review

Copy this into Claude Code (or Codex) at the root of any production codebase:

Run a 12-category OWASP code review against this codebase. Output: a
findings report at `docs/security-review-<DATE>.md` AND a reusable skill
at `~/.claude/skills/<project>-security-review/SKILL.md` so subsequent
reviews build on this one rather than restarting.

═══ PHASE 0 — DISCOVER FIRST ═══

Inspect the codebase and write your findings to
`docs/security-review-<DATE>.md`. Cover:

  - Language(s) and framework(s) — drives which categories apply
    most. Our reference stack was Node 20 + Express + better-sqlite3
    + React + WebSocket (ws). Yours may be Python/Django, Go/Gin,
    Rails, Phoenix, etc. — the categories below adapt.

  - Authentication mechanism — JWT? Session cookies? OAuth? Magic
    links? Each has a different attack surface profile.

  - Database access pattern — raw SQL (better-sqlite3, pg) vs
    parameterised ORM (Prisma, Drizzle, SQLAlchemy). Raw-SQL means
    high-risk for category 2; ORM mostly eliminates it.

  - WebSocket usage — drives category 4 only if present.

  - Subprocess spawning — `child_process.exec/spawn` (Node), `subprocess`
    (Python), backticks (Ruby), shell-out anywhere. Drives category 5.
    If MCP servers are present (we had 19), drives category 6.

  - File upload — Multer (Node), Django FileField, ActiveStorage, etc.
    Drives category 7.

  - Frontend framework + token storage — React + localStorage was our
    case; could be Vue + cookies, or server-rendered with httpOnly
    cookies. Drives category 8.

  - CORS middleware setup — `app.use(cors())` in Express,
    `django-cors-headers`, etc.

  - Rate-limit middleware — present (express-rate-limit, slowapi) or
    absent. Most codebases: absent.

Write findings + the implementation plan to the report. Stop. I'll
confirm before you run the full review.

═══ PHASE 1 — RUN THE REVIEW (after spec approval) ═══

For EACH of the 12 categories below, do all of:
  a) Run the grep patterns listed (adapted to your project's
     directory layout)
  b) Examine the matches; classify each as PASS / TUNE / FAIL
  c) For every FAIL, record: file:line, the bad pattern, the risk
     (what an attacker can do), and a ready-to-paste fix prompt
  d) For every TUNE, record the config that should be changed
  e) Skip the category cleanly if it doesn't apply to this stack

═══ The 12 categories: ═══

1. SECRETS & KEYS
   Grep: `SECRET\|KEY\|TOKEN\|PASSWORD` across code + `.env*` files.
   `export.*SECRET` (widens attack surface — flag every match).
   `apiKey\|API_KEY` in frontend code.
   FLAG: hardcoded fallbacks (`process.env.X || 'default-value'`),
   secrets in exports, real credentials in any `.env*` file.

2. SQL INJECTION (skip if ORM-only)
   Grep: `db\.prepare\s*(\`` and `\.run(\`|\.get(\`|\.all(\``
   (template-string SQL is the HIGH-RISK pattern in better-sqlite3
   and similar raw-SQL libs).
   FLAG: any user input interpolated into SQL strings via `${}`.

3. JWT AUTHENTICATION
   Grep: `jwt\.sign\|generateToken` — check for `expiresIn` option.
   `JWT_SECRET` exports.
   FLAG: `jwt.sign()` without `{ expiresIn: ... }` (THE BUG MOST
   TEAMS SHIP — non-expiring tokens + localStorage storage =
   permanent compromise on first XSS).
   FLAG: `JWT_SECRET` exported from the auth module.

4. WEBSOCKET SECURITY (skip if no WS)
   Grep: `WebSocketServer\|wss\.|ws\.on\('connection'`.
   `token=\|jwt=\|auth=` in frontend URL construction.
   FLAG: tokens in WS URL query strings (server logs + browser
   history both capture them).
   FLAG: connection-time-only auth with no per-message validation.

5. COMMAND EXECUTION (skip if no shell-out)
   Grep: `child_process\|exec\|spawn` + `shell:\s*true`.
   FLAG: `shell: true` with ANY user-influenced input.
   FLAG: template literals with `${}` inside command strings.

6. MCP / SUBPROCESS ENVIRONMENT (skip if no MCP)
   Grep across MCP integration dirs: `env:\s*process\.env\|env:\s*{`.
   FLAG: full-env passing to subprocesses — every secret in the
   parent's environment is now in N child processes. THE BUG WE
   FOUND IN 19 PLACES.

7. FILE UPLOAD (skip if no upload)
   Grep: `multer\|upload\.` + `fileFilter\|fileSize\|mimetype`.
   FLAG: Multer config with no `fileFilter` (default accepts ANY
   MIME type, including executables).
   FLAG: original filename preserved (path traversal vector).

8. FRONTEND TOKEN STORAGE
   Grep: `localStorage\|sessionStorage` + `setItem.*token\|setItem.*jwt`.
   FLAG: JWT in localStorage — any XSS becomes account compromise.
   Cross-reference with category 3 (token expiration) — if both
   hold, ANY XSS = PERMANENT compromise.

9. CORS CONFIGURATION
   Grep: `cors\(\|Access-Control` + `origin:\s*['\"]?\*\|origin:\s*true`.
   FLAG: `app.use(cors())` with no options — default is wide-open.
   FLAG: `origin: '*'` or `true` paired with `credentials: true`.

10. RATE LIMITING
    Grep: `rateLimit\|express-rate-limit\|slowapi`.
    FLAG: no rate limiting on auth endpoints (login, register, reset).
    FLAG: no global API rate limit.

11. XSS
    Grep frontend: `dangerouslySetInnerHTML\|innerHTML\|outerHTML\|document\.write`.
    FLAG: any of these with user-derived content.
    FLAG: markdown rendering without explicit sanitisation
    (marked + DOMPurify, or react-markdown's safe config).

12. ENCRYPTION (skip if no custom crypto)
    Grep: `crypto\.\|createCipher\|createDecipher\|scrypt\|pbkdf2`.
    FLAG: static or predictable IVs.
    FLAG: encryption keys stored in code.
    FLAG: ECB mode, CBC without authentication (no HMAC).

═══ Severity rubric: ═══

CRITICAL — exploitable now, no other defence in depth (e.g. JWT
fallback + localStorage + no expiration).
HIGH — exploitable given one other condition (e.g. wide-open CORS
on an API that handles user data).
MEDIUM — exploitable in specific scenarios (e.g. innerHTML on
text the user types in the same session).
LOW — defence-in-depth gap (e.g. verbose error logging that
leaks stack traces).

═══ PHASE 2 — CODIFY AS A REUSABLE SKILL ═══

After the review report is written, ALSO write
`~/.claude/skills/<project-name>-security-review/SKILL.md` with:

  - The discovered stack profile (so the next review starts pre-tuned)
  - The 12 categories adapted to this codebase's grep paths
  - The KNOWN VULNERABILITIES list — every FAIL finding from this
    review, with file:line, severity, and the fix-prompt template
  - The review WORKFLOW — your suggested 7-step Quick-Recon →
    Secrets → Auth → DB → Cmd → Upload → Hygiene sequence
  - A "Reporting Template" section so subsequent reviews produce
    consistent output

The skill should let a future reviewer (human or agent) re-run the
audit by saying "run a security review" — and the skill takes care
of the categories, the grep patterns, and the known-issue regression
checklist.

═══ PHASE 3 — VERIFY before considering the work done ═══

  - Every CRITICAL and HIGH finding has a copy-paste fix prompt
    in the report (so triage = pasting prompts, not engineering
    investigation).
  - The skill file is loaded and a quick re-run produces an
    identical category breakdown.
  - At least one fix prompt has been tested end-to-end (paste
    into an agent in a worktree branch, agent applies the fix,
    tests pass).
  - The report and the skill cross-reference each other so a
    reader of one finds the other.

Ship the report as `docs/security-review-<DATE>.md` (committed) and
the skill as a separate uncommitted file (it's a user-level skill,
not project code).

Adapt the categories to your stack — Python/Django flips category 2 from “raw SQL injection” to “ORM .raw() calls and unsafe Q() usage”; Rails adds “mass assignment via permit”; Go adds “race conditions in goroutines holding HTTP request scope.” The phase structure is what matters.

Notice What the Prompt Is Doing

  • Discovery first, again. Phase 0 maps which of the 12 categories actually apply to this codebase. A Rails-with-Devise app skips JWT entirely (category 3); a static-site-with-Lambda-API skips WebSocket (category 4) and Multer (category 7). The audit shouldn’t grep for patterns that can’t exist.
  • PASS / TUNE / FAIL, not just FAIL. A category that’s correctly configured deserves a one-line confirmation in the report, not silence. “CORS — TUNE: currently app.use(cors()), recommended config provided” is more useful than no entry, because the absence of an entry doesn’t distinguish “passed” from “didn’t check.”
  • Fix prompts inside the report. Each FAIL ends with a ready-to-paste agent prompt that implements the fix. Triage becomes “read the finding, paste the prompt, review the diff” — the agent does the engineering. This is a 5–10x speedup on the fix phase relative to “engineer reads finding and writes code from scratch.”
  • Codification as the second deliverable. Phase 2 is the part most one-shot audits skip. Without the skill, next quarter’s review starts from zero — re-reading the categories, re-deriving the grep patterns, re-discovering the same findings. With the skill, the next review starts from “here are the 12 known issues from last time; have they regressed?” plus any new categories.
  • Cross-references closing the loop. The report names the skill; the skill names the report. A future engineer landing on either one finds the other. This sounds trivial. The skipped version of this rule is the reason most security-audit reports get lost in docs/archive/.

What Actually Goes Wrong (Real Gotchas From Twelve Findings)

The prompt didn’t come from theory. Twelve concrete findings in one codebase shaped it. The most instructive four:

Gotcha 1

Non-expiring JWT + localStorage = permanent compromise on first XSS.

The vulnerability isn’t in either piece individually. JWT in localStorage is convenient and common; tokens without expiration are technically valid. The compound is the catastrophe: an XSS anywhere in the React app reads the localStorage JWT and exfiltrates it. Because the JWT has no expiration, the attacker now has the user’s account forever — there’s no time-based recovery, only revocation (which most JWT setups don’t implement).

In the codebase we audited, both halves of the compound were present: jwt.sign(payload, JWT_SECRET) with no third-argument expiration AND localStorage.setItem('token', jwt) in the frontend. A single XSS finding in the same session would have meant complete account takeover with no path to recovery short of database surgery.

Lesson: when you audit, look for compound vulnerabilities, not single ones. Scanners check categories independently. The interesting findings live at the intersections.

Gotcha 2

export { JWT_SECRET, ... } widens the attack surface for free.

The original auth module exported JWT_SECRET alongside the actual auth functions: export { validateApiKey, authenticateToken, generateToken, authenticateWebSocket, JWT_SECRET };. Every file in the project that imports any of those four functions now also has the option to import the signing secret. Not because the developer needed it — because the export list happened to include it.

This is the kind of finding scanners can’t make. The code is valid. The export is valid. The secret being in scope of the module is necessary. What’s wrong is the transitive surface area the export creates: instead of one file holding the secret, every consumer could now hold it.

Lesson: “what’s exported” is a security category in its own right, distinct from “what’s stored” and “what’s used.” Audit your exports; they widen attack surface even when they’re not actively misused.

Gotcha 3

Multer with no fileFilter accepts ANY MIME type — every Stack Overflow snippet shows this.

The most copied Multer configuration on the internet is the bare one: const upload = multer({ dest: 'uploads/' });. It works. It accepts JPEGs. It also accepts .exe, .sh, .php, and .htaccess. With the default dest configuration, files land in a directory served by your web server (if you’re not careful) under their original filenames (path traversal vector if filenames aren’t sanitised).

The fix is a 6-line fileFilter function rejecting anything not on a small whitelist of MIME types. Trivial. The reason it’s missing in the wild is that every tutorial shows the bare config; nobody shows the secure version.

Lesson: the most-copied snippet of any popular library is almost always the demonstrate it works version, not the use it in production version. Audit every library’s “starter code” against its own production-hardening guide.

Gotcha 4

“First user becomes admin” auto-bootstrap is exploitable in the first 30 seconds of any deploy.

The bootstrap pattern: if (userCount === 0) user.isAdmin = true; runs at registration time. On the first registration after a fresh deploy (or a fresh database), that user becomes admin. The exploit window is the time between deploying the app and the legitimate admin creating the first account. If anyone else hits /api/register first, they’re admin.

For a public-facing app this is a real attack — bots crawl new deploys looking for unattached admin paths. For an internal app it’s less critical but still a deployment-procedure hazard.

Lesson: “first one wins” bootstraps look elegant until they’re contests. Replace every bootstrap-on-registration with an explicit, out-of-band admin-creation flow. The first user should be admin because someone made them admin, not because they were first.

When you audit, look for compound vulnerabilities, not single ones. Scanners check categories independently. The interesting findings live at the intersections.

What This Costs

Three to four hours of an engineer’s afternoon, give or take how big the codebase is and how many of the 12 categories are FAILs rather than PASSes. The audit prompt runs the categories in parallel where possible (greps don’t depend on each other); the human reviews findings as they land; the fix prompts go into a worktree branch where the agent applies them; the codified per-project skill takes the last 20 minutes.

The recurring cost is the next review. With the skill in place, “run a security review” runs the same 12 categories against the codebase as it stands today, diffs against the last known-vulnerabilities list, and produces a delta report. That diff-review is 30–60 minutes a quarter, not 3–4 hours a quarter.

Make the Prompt Itself a Skill (One Step Beyond)

The prompt above writes a project-specific skill as Phase 2 — that’s the per-codebase audit memory. There’s one more codification step worth doing: make the audit prompt itself into a user-level skill so it’s available from any project, not just the one you ran it in first.

The two-skill pattern looks like this:

  • ~/.claude/skills/owasp-code-review/SKILL.md — the audit-runner. User-level (available from any working directory). Contains the 12 categories, the grep patterns, the discovery process, the report template. Invoke from any new project with “run an OWASP review” and the audit walks the categories cold.
  • ~/.claude/skills/<project-name>-security-review/SKILL.md — the audit memory. Per-project. Contains the discovered stack profile, the known-vulnerabilities list with fix-prompts, and a “regressions” section that grows over time. The audit-runner writes this skill on first pass and reads it on subsequent passes to skip categories that have already been hardened.

The user-level audit-runner is the multiplier. Without it, every new codebase starts the audit from a copy-pasted prompt in a conversation history. With it, “run an OWASP review” is a one-liner that loads the categories, the patterns, and the workflow into context — and knows to look for an existing per-project memory skill before starting fresh.

To create the user-level skill from this article: paste the prompt above into a Claude Code session and ask it to “turn this prompt into a user-level skill at ~/.claude/skills/owasp-code-review/SKILL.md so I can invoke it from any project — keep the 12 categories, the grep patterns, the phased structure, the report and fix-prompt format.” The first invocation is the article’s prompt; the second invocation is the skill. The third invocation is one word: “audit.”

The Broader Point

There are two kinds of security work in a codebase. The first is the catch-them-as-they-arrive layer: scanners, dependency-bot PRs, secret-scanning push protection. That work is largely commoditised — see the security scanning stack piece for the prompt that sets that up in an afternoon.

The second kind is the periodic targeted review: are the security postures of our codebase still defensible against the patterns we know to look for? That review used to require a security consultant. It doesn’t anymore. What it requires is a prompt that names the categories, the grep patterns, and the known-bad postures — and an AI agent that can execute the prompt, write findings with file:line citations, and emit fix prompts the team can paste.

The codification step is what makes it durable. A one-shot audit goes into docs/archive/ within two quarters. A skill in ~/.claude/skills/ is alive — it accumulates the codebase’s known issues over time, regresses against them automatically on every re-run, and is the right shape for the next engineer to pick up cold.

The 24-hour-build piece argued that security scanning is no longer a specialised discipline. This piece argues the next layer: security review isn’t either. The deep, codebase-specific, OWASP-by-category pass that used to be a $5,000 consulting engagement is now a 4-hour afternoon, with a reusable artefact at the end. Run it once. Use the skill. Re-run it quarterly. The codebase gets safer because the audit got cheap.

Built against a Node 20 + Express + better-sqlite3 + React + WebSocket + MCP codebase with 19 integration servers. The first audit pass found 12 distinct vulnerabilities across the 12 OWASP categories (4 critical, 5 high, 3 medium) — every one is documented in the codified review skill with a copy-paste fix prompt. The work was done with Claude Code (Opus 4.7) in a single afternoon; the resulting skill makes the next quarterly review a 45-minute task instead of starting from scratch.