June 8, 2026

10 min

Your Stripe Key Lives in Six Places. Most Teams Can’t List Three of Them.

Whiteboard sketch: a prompt scroll feeds into an AI agent which outputs a cargo ship with numbered crates representing parts of the series

Every production SaaS has 25–40 secrets spread across six or more stores. Most teams couldn’t, at 11pm on a Saturday, enumerate them all from memory, and that’s why secret rotations get deferred indefinitely. Here’s the business case for fixing it and the one-prompt audit that turns “we should document this” into a 4-hour reality.

The Question Every CTO Eventually Hits

Tuesday. Stripe emails you that one of your API keys has been seen in a public GitHub commit. You have 24 hours to rotate it. You think you can do it in an afternoon.

You can’t. Here’s the sequence that plays out:

Where’s the live key stored? AWS Secrets Manager? SSM? Vercel env? .env.production checked into a private repo? All of the above?
Which environment(s) does it live in: prod, staging, admin sub-app?
Is it ALSO mirrored as a GitHub Actions repo secret for CI?
Is it baked into the Docker image at build time, or read at runtime?
Is the STRIPE_TEST_KEY in CI a different value from STRIPE_SECRET_KEY in prod? (Yes, and crossing them once is a customer-data incident.)
Does any teammate’s .env.local need the new value?
Does Stripe support an overlap window, or is this an instant cutover?
What’s the redeploy choreography: SSM update first, then ECS task swap, or the other way around?

Every SaaS founder I know has had at least one of: “rolled the key, broke webhook signature verification for 4 hours,” “rotated Postmark, two days of welcome emails silently failed,” “found out we had three copies of the Supabase service-role key and only updated one.”

The reason isn’t ignorance. The reason is that the inventory doesn’t exist. Nobody has ever sat down and written a single document that lists every secret, every store, every reader, every rotation procedure. So when a rotation lands at 4pm on Friday, half the work is rediscovering the system from scratch.

The reason isn’t ignorance. The inventory doesn’t exist. When a rotation lands at 4pm on Friday, half the work is rediscovering the system from scratch.

What good looks like

The deliverable is one document, typically docs/secrets-management.md, that answers these five questions cold:

Where does each secret live, in every store? SSM, GitHub secrets, local env, application database, third-party dashboards, hardcoded literals. Yes, hardcoded literals.
What classes are they? Platform-issued (Stripe, Postmark, where they own the lifecycle), internal-minted (you mint them; you rotate them), per-tenant (customer credentials, encrypted-at-rest in your DB).
What does each secret unlock? The exact file paths in your codebase that read it. When the rotation breaks something, you want to know which symptom maps to which value.
How do you rotate each one without downtime? Per-issuer procedure with the overlap-window detail. Stripe gives you 12 hours; Postmark gives you forever (multiple-token-support); your own webhook bearer gives you zero seconds.
What’s drifted? Which secrets exist in prod SSM but not staging? Which CI secrets are referenced by workflows but missing from the secret store? Which production secrets are older than a year?

This kind of audit looks like it should take a week. It doesn’t, because the underlying mechanics are mechanical: grep the codebase, list the SSM keys, read the deploy workflow’s secrets:[] block, cross-reference with .env.example. Exactly the kind of “tedious but well-defined” work AI coding agents handle in an afternoon, if you frame the prompt right.

The prompt

Copy this into Claude Code (or Codex) at the root of any production codebase:

Build a complete secrets inventory + rotation procedures + drift
detection for this codebase. Output is a single living document at
`docs/secrets-management.md`.

═══ PHASE 0 — DISCOVER FIRST ═══

Before writing anything, inspect the codebase and write your findings
to `docs/secrets-management.md`. Cover:

  - Every secret STORE the system uses. For our reference SaaS this
    was six:
      - A cloud secret store with per-environment namespaces
        (we used AWS SSM Parameter Store: `/planb/`, `/planb-staging/`,
        `/planb-admin/`)
      - CI-platform repo secrets (we used GitHub Actions repo secrets)
      - Local `.env.local` files (developer machines)
      - A `private.secrets` table in the application Postgres for
        per-tenant customer credentials (we used Supabase with
        SECURITY DEFINER RPCs and a `private` schema)
      - Third-party issuer dashboards (Stripe, Postmark, Cloudflare,
        Supabase, OpenRouter — the actual source of truth)
      - Hardcoded source literals in any sibling repo (we found two
        in our AWS Lambda repo — see #4 WORST-CASE below)
    Your project may have fewer stores or different ones (Vercel env,
    Doppler, 1Password, Kubernetes Secrets, AWS Secrets Manager,
    HashiCorp Vault). List what's actually there.

  - The deploy mechanism that READS the secrets at runtime. For us
    this was the ECS task definition's `secrets:[]` block declared in
    `.github/workflows/deploy-app.yml`. That YAML block is the de-facto
    contract for "what env vars does the running container expect."
    Yours might be a Vercel/Netlify env dashboard, a Kubernetes Secret
    mount, a Pulumi/Terraform module, an SST stack file.

  - The IAM principals (or platform equivalents) that can READ each
    store. For us this was the ECS task execution role, the OIDC
    deploy role, a `planb-ops` IAM user with `AmazonSSMFullAccess`,
    and a legacy IAM user with over-broad permissions still attached
    (one of the drift items the audit found).

  - Cross-repo coupling. List every secret that is duplicated between
    this repo and any sibling repo, Lambda, Worker, Cloud Function,
    or external service. This is where the worst drift hides.

Write all findings to `docs/secrets-management.md`. Then write the
implementation plan in the same doc. Stop. I'll confirm before you
implement.

═══ PHASE 1 — IMPLEMENT (after spec approval) ═══

1. INVENTORY MAP — a table at the top of the doc listing every secret
   STORE with: what lives there, how secrets are issued INTO it, what
   code READS from it. One row per store. Five-to-eight rows. Make it
   scannable.

2. CLASS-THE-SECRETS — split every secret into three buckets:
   - PLATFORM secrets (one value per environment; lifecycle owned by
     the third-party issuer — Stripe, Postmark, Supabase, etc.)
   - INTERNAL-MINTED secrets (we generate them; they live entirely
     within our stores — webhook bearers, signing secrets,
     health-check tokens)
   - PER-TENANT secrets (customer-supplied, stored encrypted-at-rest
     in the application database; never displayed back; only hints
     shown — first 3 + last 2 chars)

3. PER-SECRET INVENTORY — for each secret, a row in a per-class table
   listing: cloud-store presence per environment, CI-secret presence,
   `.env.example` entry, exact code-reader file paths, and the issuer
   (dashboard URL or "internal-minted, see Rotation §"). The reader-
   path columns are not optional — when rotating, you need to know
   which code will fail.

4. CROSS-REPO COUPLING — THE WORST-CASE PATTERN to find and document.
   If any secret is stored both in your cloud secret store AND ALSO
   as a hardcoded string literal in another repo (Lambda, Worker, etc),
   call it out by name with the exact file path and line number. We
   found two: `LAMBDA_CALLBACK_TOKEN` and `LAMBDA_GETSCHEDULES_TOKEN`,
   each as a 69-character string-literal default in our Lambda repo's
   constructor. Rotation becomes a multi-store, multi-deploy, zero-
   overlap-window operation instead of a one-shot SSM update.
   Document the rotation choreography step-by-step. Recommend moving
   the cross-repo side to read from env-supplied-from-SSM instead of
   a string literal, and file a follow-up issue for it.

5. ROTATION PROCEDURES — one subsection per secret CLASS, then per
   issuer where it differs. For each, document:
   - The dashboard or CLI command that mints the new value
   - Whether the issuer supports an OVERLAP WINDOW (Stripe: yes, 12h.
     Postmark: yes, indefinite — multiple-token-support. Internal
     bearers: NO — instant cutover.)
   - Every store that needs updating, in dependency order
   - The redeploy required after the SSM update (and which services)
   - The verification step BEFORE expiring the old value

6. ACCESS CONTROL — list every IAM principal (or platform equivalent)
   with read access to each store. Note any legacy permissions that
   should be cleaned up. Cross-reference with the IAM-rationalisation
   plan if one exists.

7. DRIFT GAPS — name every inconsistency between stores as a numbered
   item (D1, D2, ...). The patterns to watch for:
   - "Secret X exists in `/prod/` but not in `/staging/`. The Y code
     path will fail-soft (worse than fail-hard — it goes unnoticed)."
   - "Secret Y is in SSM but missing from the bootstrap-uploader
     script (`aws-push-secrets.sh` in our case) — a fresh environment
     provisioning would silently omit it."
   - "Secret Z's `LastModified` date is unknown — cloud-store upload
     time is NOT the same as issuer rotation time. Manual audit
     against the issuer dashboard required."
   Each gap gets a severity (High / Medium / Low) and a named owner.

8. OPS CHECK — write a `scripts/ops/check-secret-drift.sh` (or your
   project's ops-script equivalent). It should diff:
   - `.env.example` ↔ cloud-store keys per environment
   - cloud-store keys ↔ deploy-workflow `secrets:[]` block
   - CI-secret references in `.github/workflows/*.yml` ↔
     `gh secret list`
   - Per-tenant secrets older than 180 days (proactive nudge fuel)
   Wire into the daily ops report. Any new drift becomes visible
   within 24 hours, not 6 months later when a customer reports
   it broken.

═══ PHASE 2 — VERIFY before shipping ═══

  - Each Dx drift gap in the doc has a clear severity + owner
  - `check-secret-drift.sh` runs cleanly and produces actionable
    output (no false positives that drown the signal)
  - At least one rotation procedure is end-to-end tested in staging
    (or, if no staging exists, documented well enough that a
    third-party engineer could execute it cold)
  - The cross-repo hardcoded-literal items (if any) have a follow-up
    issue filed for the source-literal removal
  - The doc references the deploy workflow / IaC files by exact path
    so future-me can find them

Ship as a single PR. The PRD-style document is the deliverable — don't
split it. The ops-check script can be a separate small PR.

Swap AWS SSM for Doppler or Vault, Stripe/Postmark for whatever issuers you use, Supabase for your DB. The structure does the work.

What it does

Discovery first. Phase 0 forces the agent to map the actual store topology before assuming anything. The reference stores are named (SSM, GitHub secrets, Postgres, hardcoded literals) so the agent has concrete patterns to match, and it’s told to adapt to whatever’s there.
Three secret classes, not one bucket. Platform / internal-minted / per-tenant have different rotation choreographies and different blast radii. Mixing them hides the gradient.
The worst-case pattern named upfront. Cross-repo hardcoded literals (#4) are the failure mode that costs teams hours of downtime when nobody knew they existed. Naming the exact pattern guides the agent to look for it explicitly, with file paths and line numbers.
Reader paths are mandatory. The single most useful column in the per-secret inventory is “what code reads this value.” Without it, the inventory is decoration. With it, every rotation gets a 30-second pre-flight check.
Drift detection as a script, not a one-shot scan. Phase 1 #8 wires drift detection into the ongoing ops report. Without that, the audit is only as fresh as the day someone last ran it.

What goes wrong

The prompt came out of an audit that found ten drift items in a production system. The instructive ones:

Finding 1

Two production bearer tokens were string literals in another repo.

Our cross-repo Lambda backend authenticated callbacks to the main app using a bearer token. The token was in AWS SSM on the main-app side; on the Lambda side it was a 69-character string literal at line 9 of bubble-callback/index.js. Every constructor call defaulted to the literal. Rotation meant: change the literal, redeploy the Lambda stack via SAM, push the new value to SSM, force a new ECS deploy of the main app, and accept a several-second window where callbacks fail auth between Lambda-deploy-complete and ECS-task-swap-complete.

Lesson: the riskiest secrets are the ones where rotation requires coordinated deploys across multiple repos. Audit for hardcoded literals in any sibling repo by name. They’re invisible to every drift-check tool that operates on store-to-store comparisons.

Finding 2

Staging was missing two secrets that production had.

UNSUBSCRIBE_TOKEN_SECRET and GA4_API_SECRET existed in /planb/ SSM but not in /planb-staging/. The email-send code reads UNSUBSCRIBE_TOKEN_SECRET to HMAC-sign one-click unsubscribe URLs (RFC 8058). In staging it would either throw at startup or, much worse, sign with undefined without complaint, producing URLs that look valid but fail verification. The GA4 helper was fail-soft and just warned. Both were invisible until the audit ran.

Lesson: staging-vs-prod SSM drift is the most common production-secrets bug, and it always favors silent failure over loud failure. Any audit that doesn’t explicitly diff per-environment is missing the most important comparison.

Finding 3

The bootstrap script was three secrets behind the deploy workflow.

Our aws-push-secrets.sh script reads .env.local and pushes to SSM. It is the entry point for any environment provisioning. It hadn’t been updated when three new secrets (UNSUBSCRIBE_TOKEN_SECRET, GA4_ID, GA4_API_SECRET) were added to the deploy workflow’s secrets:[] block. A fresh environment provisioned with the script would silently come up missing those three values.

Lesson: the bootstrap script and the deploy workflow are two ends of the same contract. They stay in sync only if a drift-detector tells you when they don’t.

Finding 4

“Last rotated” is a lie everywhere.

SSM’s LastModifiedDate tells you when the value was uploaded TO SSM, not when the issuer rotated the underlying secret. We had Stripe live keys that SSM thought were “rotated 2026-04-23” but Stripe’s dashboard showed the actual key issuance was 2024. The cloud-store-modified-time and the issuer-rotated-time are the same number only on day one.

Lesson: if you care about rotation hygiene, you need to either record the issuer-side rotation timestamp manually, or accept that “last rotated” is a known-unknown. No automated way to bridge this gap that I’m aware of.

Finding 5

Public values still need rotation discipline.

NEXT_PUBLIC_SUPABASE_ANON_KEY is technically public. It’s baked into the browser bundle. People assume “public” means “no rotation needed.” It doesn’t. The anon key is the keychain that RLS policies authorize; rotating it without a coordinated bundle rebuild + cache-bust will break every live browser session.

Lesson: “public” and “rotates with no fuss” are different properties. The audit doc should track build-time-public values alongside server-side secrets, with the additional note that browser caches will hold the old value until they reload.

Staging-vs-prod SSM drift is the most common production-secrets bug, and it always favors silent failure over loud failure.

What it costs

Four hours of an engineer’s afternoon, with the agent doing the mechanical inventory work and the human doing two things: providing access (so the agent can inspect SSM, list IAM principals, read the deploy workflows) and reading the draft to catch the things the agent couldn’t see (issuer dashboards, MFA configuration, team-access reality vs documented).

The output is a permanent document that pays back the next time anyone rotates any secret. The half-life of a production secret is somewhere between six months and never, and “never” is the wrong answer.

Why this matters

Most production systems accumulate secrets debt the same way they accumulate every other kind of debt: one well-justified addition at a time, with no overall view of what’s been added, where, or how the pieces interact. The day you need that overall view, for a rotation, an incident, an audit, a team handover, is also the day you don’t have time to build it.

An AI coding agent is the right tool for this work. Grep for process.env.*, list SSM parameters, parse the deploy workflow YAML, cross-reference with .env.example, generate a table. The valuable judgement, what to do about the gaps, sits with the human. The agent does the inventory; you do the policy. Four hours of combined attention turns a permanent backlog item into a permanent reference.

Built on PlanB, a Bubble.io backup service. Stack: AWS ECS Fargate + SSM Parameter Store, Supabase Postgres, Stripe Checkout, Postmark, deployed via GitHub Actions OIDC. The audit ran against a system with 27 secrets across six stores and surfaced ten drift items. The work was done with Claude Code (Opus 4.7) in a single afternoon, with about four hours of human review on top to validate findings against the issuer dashboards.