Most SaaS apps ship without rate limits because the choices look bad: Cloudflare Pro is $300/year for five rules that can’t see your account_id; Upstash Redis is a new stateful dependency with its own failure mode; in-process token buckets break the moment you run two instances. There’s a fourth option — sliding-window counts in Postgres, served from a single SECURITY DEFINER RPC — that we shipped in an afternoon. This is the one-prompt setup, the four alternatives we rejected and why, and the five gotchas an independent reviewer surfaced (including one that would have silently broken every limit under concurrency).
The Problem That Nobody Notices Until They Get Hit
Walk through any early-stage B2B SaaS and you find the same gaps. The public accept-invitation endpoint has no per-IP throttle. The Lambda-backed test-connection route can be called 60 times a second from a stolen cookie. The account/deactivate route triggers a Stripe API call, a Lambda fan-out, and a DB cascade — and nothing stops a buggy frontend from firing it twenty times in a row.
Most of these aren’t security holes in the OWASP sense. They’re denial-of-wallet vectors. The attacker doesn’t break in; they make you spend until someone notices. A leaked invitation link burns Postmark sends. A bored teenager replays your endpoint for fun. The textbook answers all have problems, so most teams ship the gap.
Proper rate limiting should answer in 30 seconds: which IP just tried the invite-accept endpoint twenty times in 15 minutes; which account_id is responsible for 80% of this hour’s Lambda invocations; whether a leaked Bearer token can drain Postmark before you rotate; and when you 429 a real user, when should they retry. A throttle that can’t answer those isn’t doing the job.
Most of these aren’t security holes in the OWASP sense. They’re denial-of-wallet vectors. The attacker doesn’t break in; they make you spend until someone notices.
Why Not Redis, Cloudflare Pro, or an In-Process Bucket
Four obvious alternatives, four reasons each one is wrong for a small-team SaaS:
Cloudflare WAF rate-limit rules. What most teams reach for first. Cloudflare Free does not include configurable per-path rate-limit rules — those start at Pro ($25/month, five rules). Cloudflare sees only the source IP, not the authenticated account_id. Per-account spend caps — the most common shape of authenticated abuse — are impossible at the edge. The load balancer hostname behind Cloudflare is also discoverable; an attacker who finds it bypasses the edge entirely.
Upstash Ratelimit (managed Redis). Adds a new stateful dependency with its own failure mode — when Upstash has a bad day, do you fail open (lose the throttle) or fail closed (take your app down)? Either is worse than not having Redis. Free tier covers light usage; cross it and it’s another monthly bill on top of the database you already operate.
In-process token bucket. A Map<string, number> in memory. Works perfectly until you run more than one instance. Two ECS tasks each see half the traffic and each maintain their own counter, so the effective limit doubles. An attacker who lands on different tasks bypasses the limit entirely. The fix is a shared store — the dependency you were trying to avoid.
Stretching an existing audit table. The honest temptation: we already have admin_audit_log. But rate-limit events have a different retention story (days, not years), a different volume profile (one per request at scale), and a different naming intent. Mixing the two means the reaper risks deleting audit data, and every “show me security events” query has to filter out the noise.
The fifth option — a dedicated rate_limit_events table queried through a single Postgres RPC — has none of these problems. Postgres is already a stateful dependency you maintain. The RPC runs in the same connection pool as everything else. The cost is $0/month and one migration.
The One Prompt That Does the Code Work
Copy this into Claude Code (or Codex) — the first thing it does is discover your stack, so it works regardless of which Postgres flavour, ORM, or auth library you’re on:
Set up Postgres-counted rate limiting for this app.
═══ PHASE 0 — DISCOVER FIRST ═══
Before writing any code, inspect the codebase and write your findings to
`docs/rate-limiting.md`. Cover:
- Database flavour and access pattern. (Our reference setup used
Supabase Postgres with the supabase-js client + RPCs; yours may be
raw RDS, Neon, Prisma, etc.)
- Origin and edge posture. Are you behind Cloudflare? An ALB or NLB?
Multi-instance? This determines whether IP attribution is trustable
and from which header.
- Existing throttle patterns to avoid duplicating. Some apps already
have "count rows in domain table" patterns (e.g. invite-rate-limit
on a per-owner basis). Note them; don't migrate them yet.
- Auth shape for each route that needs throttling. Public/unauth?
Cookie session? Bearer? Knowing this maps to the public-vs-authed
wrapper choice in step 6.
- Whether the project regenerates DB types from the live schema and
where they live. THIS WILL BITE YOU — see step 10.
Then write the design + the bucket inventory + the implementation plan
in the same doc. Stop. I'll confirm before you implement.
═══ PHASE 1 — IMPLEMENT (after spec approval) ═══
1. SCHEMA — generic `rate_limit_events` table.
- `id BIGSERIAL PK, bucket TEXT NOT NULL, ip INET, identifier TEXT,
succeeded BOOLEAN DEFAULT FALSE, metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW()`
- Two partial indexes for the two count shapes:
(bucket, ip, created_at DESC) WHERE ip IS NOT NULL
(bucket, identifier, created_at DESC) WHERE identifier IS NOT NULL
- One created_at index for the reaper.
- `ENABLE ROW LEVEL SECURITY` with NO policies — service-role only.
REVOKE ALL FROM PUBLIC, anon, authenticated; GRANT ALL TO service_role.
2. RPC — `check_and_record_rate_limit`.
- SECURITY DEFINER, `SET search_path = public`, `statement_timeout = 5s`
(re-applied after CREATE OR REPLACE — it gets dropped on re-create).
- First statement in the body: `IF auth.role() <> 'service_role' THEN
RAISE EXCEPTION 'forbidden'; END IF;`
- REVOKE EXECUTE FROM PUBLIC/anon/authenticated; GRANT to service_role.
3. THE CONCURRENCY GOTCHA — per-scope advisory locks. THIS IS NOT OPTIONAL.
- Inside the RPC, BEFORE the INSERT:
PERFORM pg_advisory_xact_lock(
hashtextextended(p_bucket||':ip:'||host(p_ip), 0));
PERFORM pg_advisory_xact_lock(
hashtextextended(p_bucket||':id:'||p_identifier, 0));
- WHY: under READ COMMITTED (Postgres default), 100 parallel requests
hitting the same (bucket, scope) each see only their own uncommitted
insert. Each counts 1. Each passes a limit of 10. All 100 commit.
The limit is silently broken exactly when an attacker is most active.
Advisory locks serialise per (bucket, scope); unrelated scopes never
contend. They release at COMMIT/ROLLBACK automatically.
4. BOUNDED COUNTS — `LIMIT p_limit + 1` subquery.
- `SELECT COUNT(*) FROM (SELECT 1 FROM rate_limit_events
WHERE bucket = p_bucket AND ip IS NOT NULL AND ip = p_ip
AND created_at > now() - make_interval(secs => p_window_seconds)
LIMIT p_ip_limit + 1) bounded;`
- WHY: exact COUNT(*) grows linearly with attack volume. Once you
know it's "≤ limit" or "> limit" you have enough to enforce —
you don't need the precise number. Bounded counts keep the read
cost flat under abuse.
- INCLUDE the explicit `ip IS NOT NULL` predicate so the planner uses
the partial index. Without it, parameterised queries can miss the
partial-index match.
5. INPUT VALIDATION — RAISE in SQL + throw in TS.
- In SQL: null bucket, zero/negative windows, identifier-limit
without identifier-window — all RAISE P0001. The SDK silently
disabling a misconfigured limit is the worst failure mode.
- In TS: mirror the checks at the top of the helper and `throw`
before the RPC round-trip. Fail fast at the call site.
6. TYPESCRIPT HELPER — `src/lib/rate-limit.ts`.
- Core function `checkAndRecordRateLimit(opts)` returns
`{ allowed, ipCount, identifierCount, retryAfterSeconds }`.
- TWO NAMED WRAPPERS — DO NOT EXPOSE THE BARE `failClosed` FLAG:
`checkPublicRateLimit(opts)` → failClosed: true (denies on RPC
error; right default for unauth endpoints where the throttle
IS the auth).
`checkAuthedRateLimit(opts)` → failClosed: false (allows on RPC
error; the route's own auth gate is the primary defence).
Engineers wiring new routes pick by function name, not by setting
a boolean. The intent is grep-able.
7. IP ATTRIBUTION — `getClientIp(headers)`.
- Trust `cf-connecting-ip` ONLY when `cf-ray` is ALSO present.
- If only one is present, return null and fall back to identifier-only.
- DO NOT consult `x-forwarded-for` or `x-real-ip` for security
purposes — those are trivially client-settable when not behind a
trusted proxy. The Free CF plan is the ONLY trusted proxy here.
- WHY: anyone who finds the ALB/origin hostname can hit your app
directly and set cf-connecting-ip themselves to rotate their IP
bucket. The cf-ray gate is your only on-CF check.
8. WIRE A ROUTE.
- Throttle BEFORE the expensive operation, so denied attempts don't
burn the Lambda call / Stripe API call / Postmark send.
- On 429: return BOTH a `Retry-After` header AND a
`retry_after_seconds` field in the JSON body. The browser `fetch`
API drops the header through some CORS configurations; the client
reads the body preferentially.
- In the error copy: "Please try again in X minutes" — format the
duration human-readable, not "Try again later."
9. THE ANTI-LIST — routes that MUST NEVER be rate-limited.
- Inbound Lambda callbacks (Bearer-authed, high-volume legitimate
traffic from a known source).
- Stripe webhook (signature-authed; Stripe retries with backoff but
a 429 still counts against the retry budget).
- Cron / orchestrator endpoints (Bearer-authed; 429 = the cron tick
is lost, schedules skip a beat).
- Document these in the design doc as an explicit anti-list. The
mistake of "let's just add a throttle everywhere" causes data loss,
not abuse mitigation. Use a queue + worker if back-pressure is
genuinely needed.
10. DB-TYPES REGEN — THE BUG THAT WILL BITE YOU.
- If the project uses generated DB types (Supabase, Prisma, etc.):
regenerating from one environment may DROP types for tables that
exist on a DIFFERENT environment but not in local migrations
(e.g. read-only shadow tables, legacy callback tables).
- SAFE APPROACH: `git checkout HEAD -- <generated types file>` to
restore, then HAND-ADD only the new entries for rate_limit_events
+ the two RPCs. Diff should be additive only — verify with
`git diff --numstat` and confirm 0 deletions.
11. pgTAP REGRESSION TEST — pin the contract.
- Function exists with the expected signature (`has_function`).
- Allow→deny boundary at p_limit + 1.
- Denied attempts STILL record a row (record-before-check semantics).
- Window expiry — rows older than the window don't count.
- Bucket isolation — fresh bucket starts at 1 even when another
bucket has many rows for the same IP.
- Identifier-only mode (p_ip = NULL) works.
- Identifier-limit-without-window raises P0001 (the input-validation
regression).
- `anon` and `authenticated` get permission denied on the RPC.
12. DESIGN DOC — `docs/rate-limiting.md` is the canonical record. It
contains: rejected alternatives + WHY, bucket inventory (route,
wrapper, per-IP, per-identifier, identifier shape, rationale),
rule-of-thumb for choosing limits, the anti-list from step 9, and
a failure-modes table.
═══ PHASE 2 — VERIFY before shipping ═══
- pgTAP suite green (the input-validation and bucket-isolation
assertions are the high-value ones — make sure they're pinned).
- TypeScript helper unit tests green: cf-ray gate, fail-closed default,
fail-open via authed wrapper, input-validation throws.
- typecheck and lint clean.
- One end-to-end smoke: hit the route 12 times in 60 seconds, confirm
the 11th and 12th return 429 with a Retry-After matching the window.
Ship in 3 small commits, NOT one mega-PR: (a) the migration + pgTAP,
(b) the TS helper + tests + the first route wire-up, (c) the design doc.
Each independently deployable.
Adjust the bucket list and limits to your product. The structure of the prompt — discovery, the named gotchas, the verification step — is what does the work.
What the prompt is doing:
- Discovery first. Phase 0 forces the agent to inspect your stack before assuming anything. It names our reference setup (Supabase Postgres + supabase-js + Cloudflare) so the agent has concrete examples to compare against, and tells it to adapt to what’s actually there.
- The concurrency hole called out by name. Section 3 explains why (READ COMMITTED + uncommitted inserts), gives the exact lock-key formula, and notes that the locks release at commit. An independent reviewer caught this in our first version; no Postgres tutorial we’d read had mentioned it.
- Named wrappers, not boolean flags. Section 6 forbids exposing the bare flag because the fail-policy choice is too important to leave to a parameter someone forgets to set. The function name carries the intent.
- The anti-list as a first-class section. Section 9 names the routes that must never be rate-limited. Product-domain wisdom, not generic Postgres advice. Every Lambda-backed app has these routes.
- The types-regen footgun as a numbered step. Section 10 is there because we hit it. Regenerating types from a local environment silently dropped six tables that existed on prod but not in local migrations.
From this single prompt the agent writes the migration, the RPC with advisory locks and bounded counts and input validation, the TS helper with two named wrappers and the cf-ray IP gate, the pgTAP regression with eleven contract-pinning assertions, the design doc with the rejected alternatives table and the bucket inventory, and the route wire-up for the first endpoint. Three to four commits over an afternoon.
What Actually Goes Wrong (Real Gotchas From a Real Setup)
The prompt covers most of the architecture, but five gotchas surfaced during execution — three from an independent code review by a second AI agent — that are worth knowing before you start:
Under READ COMMITTED, your rate limit silently does nothing under concurrency.
Our first RPC inserted the row, counted the window, compared against the limit. Each invocation in its own transaction. Under Postgres’s default READ COMMITTED isolation, transaction A can’t see transaction B’s uncommitted insert — two simultaneous requests each see a count of 1, each pass a limit of 10, each commit. Extend to 100 simultaneous requests and you have 100 rows past a limit of 10.
The fix is pg_advisory_xact_lock(hashtextextended(bucket||':ip:'||host(ip), 0)) — a transaction-scoped lock per (bucket, scope) that serialises concurrent transactions on the same key. Unrelated scopes don’t block; locks release at COMMIT/ROLLBACK.
Lesson: insert-then-count under MVCC requires per-scope serialisation. The obvious implementation fails silently exactly when the throttle matters.
Some routes must never be rate-limited — and they’re the obvious-looking ones.
When you ship rate limits, the temptation is to apply them everywhere. The routes we deliberately excluded are documented as an anti-list in the design doc. Lambda backup callbacks come from a known Bearer-authed source — a 429 means a lost backup record, not abuse mitigation. Stripe webhooks are signature-authed; a 429 still counts against Stripe’s retry budget. Cron orchestrator endpoints get one tick per minute; throttling them means a schedule beat is silently skipped.
Lesson: rate limiting is a posture, not a default-on policy. If back-pressure is genuinely needed on a write-from-trusted-source endpoint, the answer is a queue with a concurrency cap, not a 429.
Cloudflare-fronted IPs are trustworthy only if you verify the request actually transited Cloudflare.
Our first version trusted cf-connecting-ip whenever it was present. The independent review pointed out: anyone who finds your origin hostname can hit the app directly and set cf-connecting-ip themselves. The fix is to require BOTH cf-connecting-ip AND cf-ray — Cloudflare always emits both on a proxied request, and cf-ray is the verifiable signal that the edge actually saw it. Without cf-ray, treat the IP as untrusted and fall back to identifier-only throttling.
Lesson: a spoofable header behaves the same as no header until you check the signature.
Regenerating database types from one environment silently deletes types defined in another.
Our project keeps a generated database.types.ts file. After applying the migration locally we ran the regen — diff was +595 / −530. The +595 was the new table and RPC types. The −530 was six legacy callback tables that exist on prod but not in local migrations; local doesn’t know about them, so the regen happily dropped them.
The fix is git checkout HEAD -- src/lib/database.types.ts to restore, then hand-add only the new entries. Diff goes from “additive plus destructive” to additive only: +52 / −0.
Lesson: schema regeneration is silently destructive across heterogeneous environments. Always confirm the diff is purely additive before committing generated files.
Exact COUNT(*) makes your throttle slower as the attack gets bigger.
The naive count is SELECT COUNT(*) FROM rate_limit_events WHERE bucket = $1 AND ip = $2 AND created_at > $3. As the attacker pounds the endpoint, the window fills with denied-attempt rows, and the count gets slower. By the time the attacker hits 10k attempts/minute, the throttle itself adds latency.
The fix is a bounded subquery: SELECT COUNT(*) FROM (SELECT 1 FROM rate_limit_events WHERE ... LIMIT p_limit + 1) bounded. Once Postgres has read p_limit + 1 rows, it knows the answer is “> limit”. Cost stays flat regardless of attack volume.
Lesson: enforcement only needs to know “≤ limit” or “> limit”, not the precise number. Pay accordingly.
The obvious implementation fails silently exactly when the throttle matters. The interesting work is the gotchas — the implementation steps were the easy half.
What This Actually Costs
About a day of an engineer’s attention, including the independent review pass. The agent does the implementation; the human’s job is to choose the bucket limits (which involves looking at legitimate user behaviour), review the pgTAP and unit tests, run the deploys, and do the end-to-end smoke. Ongoing operational cost is zero — no new infrastructure, no new bills, the reaper is a SQL function the existing pg-boss can call. Under a day of work for a throttle that holds under load, on infrastructure you already operate, that can answer per-account and per-IP questions Cloudflare can’t.
The Broader Point
Rate limiting gets pushed to “later” because the standard options all have visible costs. The Postgres pattern doesn’t, but it has invisible ones — the concurrency hole, the IP-attribution trap, the regen footgun, the temptation to apply it everywhere. Those are exactly the kind of details an AI agent will get right if you tell it about them up front, and will get wrong if you don’t. The prompt above is what we’d give a future agent if we were starting this work over. The gotchas are the artefact; the implementation steps were the easy half.
Built on PlanB, a Bubble.io backup service. Stack: Postgres on Supabase, supabase-js with SECURITY DEFINER RPCs, Next.js 16 App Router, Cloudflare Free in front. Implementation by Claude Code (Opus 4.7); independent review pass by Codex CLI (0.137). About a day of human review and decision time spread across an afternoon.