Every production SaaS has 3rd-party integrations crossing its boundary in both directions — webhooks and callbacks pushing data IN, API calls and outbound webhooks pushing data OUT — and most teams have a record of neither. When inbound writes fail (a deploy ships a bug, a constraint changes, a retry budget expires), the lost payloads vanish; you find out from a customer weeks later. When an outbound call fails or gets retried, you can’t tell whether the 3rd party actually got it — and “did Stripe charge that customer?” becomes a panicked tab-switch instead of a 30-second log query. The fix in both directions is the same: one structured log line emitted at the chokepoint, BEFORE the inbound write and AROUND every outbound attempt. Half a day to implement; it pays for itself the first time it’s needed. Today, the inbound version of this caught a 2.5-hour callback-pipeline outage in our system and isolated exactly one lost row out of 40 attempts; the outbound version has, on past incidents, settled “did the API call go through” arguments with Stripe and Postmark support in minutes. Here’s the business case, the one prompt that wires both directions, and the gotchas.
The Failure Mode Nobody Plans For
Two flavours of the same story.
Inbound. A 3rd party pushes data to you. You write a webhook handler, a callback receiver, a queue consumer. It works for months. Then a deploy ships a migration that subtly breaks the upsert — a partial unique index can’t be inferred by the ORM’s onConflict call, say — and every inbound write returns 500 for two hours before anyone notices. The 3rd party retries. Each retry also gets 500. Eventually their retry budget runs out. The payload is gone. The data the customer sent you exists in their system; it does not exist in yours.
The question that lands on your desk three weeks later: “My customer says they ran a backup on Tuesday and it isn’t showing in their dashboard. Can you check?” Sentry shows 500s but not request bodies. The deploy that caused it was four releases ago. The customer’s evidence is a screenshot from a week ago. You apologise, refund, and quietly hope they don’t churn.
Outbound. You call Stripe to create a subscription. Your HTTP client retries three times: the second attempt actually succeeded but the response was lost on the network, and the third attempt got back a 409 conflict — which your code surfaces as “subscription creation failed.” The customer’s card is charged. Your system says they aren’t a paying customer. Support thread opens. Stripe says “we received your request.” You can’t prove or disprove because you never logged what you sent or what came back.
Both failure modes are silent — nothing pages. Both are expensive at the wrong moment: the question lands without warning, and the data you need to answer it is exactly the data nobody saved. The fix is the same on both sides of the boundary: log every payload that crosses it.
Both failure modes are silent — nothing pages. Both are expensive at the wrong moment: the question lands without warning, and the data you need to answer it is exactly the data nobody saved.
What “Doing It Right” Looks Like
Five pieces, in parallel for both directions.
1. Find every chokepoint where data crosses the boundary. Inbound: webhook handlers, queue consumers, Lambda destination handlers, scheduled jobs that poll a remote API and persist. Outbound: HTTP client instances (an axios.create(), an ofetch wrapper, a fetch helper), SDK clients (Stripe, Postmark, Twilio, AWS SDK, partner API libraries). Specifically not in-app writes or in-app function calls — those you can replay from your own request history. The chokepoints worth instrumenting are the ones where the request or response leaves your control.
2. Wrap each chokepoint with a structured audit log emission. Inbound: one line emitted BEFORE the write, carrying the full inbound payload. Outbound: one line AROUND every attempt — BEFORE the call, with the request, and AFTER the call, with the response (status, body, duration). Both written to stdout, both landing wherever your existing logs go (CloudWatch, Stackdriver, Datadog, Loki). It costs one console.log call per attempt.
3. Give the audit line the right shape. A greppable message field so a log search finds them in seconds. An audit_version so a future shape change doesn’t silently invalidate older entries. An idempotency_key (inbound) or correlation_id (outbound) derived from natural business fields, NOT from synthetic IDs you mint at insert time. A payload_sha256 over the canonical JSON of the payload so a replay tool can detect a truncated log line. The full payload. Origin metadata: function name, log group, trace id.
4. Write a one-page recovery runbook. Inbound version: when the webhook was 500ing between 13:23 and 16:10, run one log query, export to NDJSON, diff the idempotency keys against the destination table, re-insert missing rows. Outbound version: when support asks “did we charge this customer,” run one log query filtered to the customer’s correlation_id and read the full request + response chain. Half a page of bash and SQL each. The runbook lives next to the emitter.
5. Wire a count-parity canary into your daily ops check. Inbound: audit-line count vs destination-table row count over the last 24h. Outbound: audit-line count vs the 3rd-party dashboard’s call count for the same window where comparable (Stripe, Twilio, Postmark all expose this). A sustained delta = silent loss or silent over-call in flight. A canary makes the audit log find the failures you don’t know about, not just the ones you do.
None of these pieces are difficult. The hard part is admitting that the system needs them before the first incident proves it.
The One Prompt That Does the Work
Paste this into Claude Code (or Codex). Phase 0 forces it to map your inbound and outbound boundary crossings before assuming anything, so it works regardless of stack.
Add audit logging for every 3rd-party boundary crossing — inbound
payloads we receive and outbound calls we make — so we can recover
lost data and answer "did this call actually go through" questions
from the logs.
═══ PHASE 0 — DISCOVER FIRST ═══
Before writing any code, inspect the codebase and write your findings
to `docs/boundary-audit.md`. Cover BOTH directions.
INBOUND — every code path where a 3rd party PUSHES data INTO us.
Patterns to find:
- Webhook handlers (Stripe, Slack events, GitHub, Postmark inbound,
custom callback routes)
- Queue consumers (SQS subscribers, pg-boss handlers ingesting
external messages, Kafka consumers)
- Scheduled poll-ingestion (cron tasks that read a 3rd-party API
and persist)
- Lambda / Worker / Function destinations that POST callbacks back
to us
- The DESTINATION of each path — the table being written, and the
insert/upsert chokepoint. If multiple routes funnel through a
shared `.upsert()` wrapper, instrument the wrapper once.
OUTBOUND — every code path where WE call OUT to a system we don't
control. Patterns to find:
- HTTP client instances (axios.create, ky, ofetch, fetch wrappers)
- SDK clients (Stripe, Postmark, Twilio, AWS SDK, OpenAI/Anthropic,
partner-API libraries)
- Retry policies in place (axios-retry, fetch-retry, native retry
loops). Note the retry budget — these become first-class events
in the audit, one entry per attempt.
- Identify whether you have ONE shared HTTP client wrapper (often
you do — instrument the wrapper) or many ad-hoc ones (consolidate
into a wrapper first, THEN instrument).
For our reference setup inbound paths were 5 webhook routes at
`src/app/api/webhooks/*/route.ts`; outbound was a single SDK call site
per provider with no shared HTTP wrapper. Your project may have inbound
under `pages/api/`, a NestJS controller, an Express router; outbound
may be a Vercel API route hitting Stripe via the official SDK, a
Cloudflare Worker calling a partner API via fetch, etc.
Also cover:
- The OBSERVABILITY backend that captures stdout for these paths
(AWS CloudWatch, Stackdriver, Datadog, Loki, Vercel Logs). Note
the QUERY interface so the runbook can reference exact syntax.
- RETENTION POLICY on those log groups. Recovery is bounded by
retention — 14 days minimum, 90 days more comfortable.
Then write the implementation plan in the same doc. Stop. I'll
confirm before you implement.
═══ PHASE 1 — IMPLEMENT (after spec approval) ═══
1. EMITTER MODULE — single file (e.g. `lib/audit/boundary.ts`)
exporting TWO functions:
emitInboundAudit(destination, payload)
emitOutboundAudit(phase, correlation_id, attempt, payload)
Both wrapped in try/catch — audit emission MUST NOT throw and
break the real call. Worst case: emit `boundary_audit_failed`
and continue.
2. INBOUND EMITTER — emits one JSON line BEFORE each insert/upsert
that ingests external data:
{ timestamp, level: "info",
message: "boundary_audit", // greppable
audit: "inbound_write_attempt",
audit_version: 1,
destination, // table name
idempotency_key, // from natural fields
payload_sha256,
payload, // full row
origin: { function, log_group, trace_id } }
BEFORE, not AFTER. The case to recover from is the write that
DIDN'T happen.
3. OUTBOUND EMITTER — emits TWO JSON lines per attempt: one with
`phase: "request"` BEFORE the call, one with `phase: "response"`
AFTER. Both share the same correlation_id; attempt number
increments on each retry.
{ timestamp, level: "info",
message: "boundary_audit",
audit: "outbound_attempt",
audit_version: 1,
phase: "request" | "response",
correlation_id, // stable across retries
attempt, // 1, 2, 3, ...
destination, // e.g. "stripe.subscriptions.create"
request: { method, url, body, headers_safe }, // phase=request
response: { status, body_safe, duration_ms }, // phase=response
error: { name, message, code } | null, // phase=response
origin: { function, log_group, trace_id } }
Generate correlation_id ONCE per logical call (e.g. one Stripe
subscription-create), reuse across all retries. Where the 3rd
party supports it (Stripe, Square, several others), set the
`Idempotency-Key` header to the correlation_id so the remote
dedupes your retries server-side too.
4. KEY MAP — a constant in the same module:
AUDIT_KEY_FIELDS = {
inbound: {
'backup_results': ['s3_key', 'csv_file_part'],
'stripe_webhook_events': ['stripe_event_id'],
'inbound_emails': ['message_id'],
},
outbound: {
// generation strategy for correlation_id per destination
'stripe.*': 'idempotency-key-from-business-context',
'postmark.send': 'message-id-after-response',
default: 'crypto.randomUUID()',
}
}
For inbound: fields MUST be natural identifiers set deterministically
by the source (Stripe event id, S3 object key, Message-Id). NEVER a
synthetic id minted at insert time — replays would mint a fresh one
and break dedup.
For outbound: correlation_id should be derived from the business
action where possible (subscription_id, order_id) so it's
reconstructible from your DB even if the call failed.
5. WIRE INTO THE CHOKEPOINTS:
- INBOUND: call `emitInboundAudit()` immediately BEFORE each
insert/upsert that ingests external data. If multiple routes
share a wrapper, instrument the wrapper.
- OUTBOUND: wrap the HTTP client (axios instance, fetch wrapper,
SDK client). Emit "request" before the call, "response" after.
For retry loops (axios-retry's onRetry hook, native loops),
ensure each attempt produces its own request+response pair.
One call → N pairs, where N = final attempt count.
6. RECOVERY RUNBOOK — short doc at `docs/boundary-recovery.md` with
two procedures:
INBOUND recovery — when a destination table is suspected of
missing rows:
a. Pin the broken window: deploy time of the breaking code →
deploy time of the fix.
b. Log query: `message="boundary_audit" AND audit="inbound_write_attempt"`
+ window. Export to NDJSON.
c. Dedupe by idempotency_key (retries emit one audit per attempt).
d. LEFT JOIN against the destination table on the idempotency-
key column → list missing rows.
e. For each missing row, reconstruct the insert from the audit's
`payload`. Re-insert with ON CONFLICT DO NOTHING.
Note on field-name divergence: if the inbound payload's field
names don't match the destination's column names (we hit this:
inbound `has_completed_download_all_records_normally` mapped to
column `has_completed_normally`), reuse the route handler's
transformation in the recovery script.
OUTBOUND lookup — when support asks "did we call X for customer Y":
a. Reconstruct the correlation_id for the call (from the
business context: subscription id, order id, etc).
b. Log query: `message="boundary_audit" AND audit="outbound_attempt"
AND correlation_id="..."` — returns all attempts in order.
c. Read request + response chain. Answer the question.
For Stripe / Postmark / Twilio support, the response.body
typically contains the provider's own request id (e.g. Stripe's
`request-id` header echoed in errors) — quote it back.
7. COUNT-PARITY CANARY — wire into the daily ops report:
- INBOUND: audit-line count vs destination-table row count over
last 24h. Sustained delta = silent loss in flight.
- OUTBOUND: audit-line count vs the 3rd-party dashboard's call
count for the same window where the provider exposes it (Stripe
API logs, Postmark activity, Twilio insights). Without this
canary the audit log helps only incidents you already noticed.
8. PII / SECRETS DISCIPLINE — both directions carry data you may not
want in logs.
- INBOUND payloads: bearer tokens in headers, API keys, customer
PII. Audit each shape; decide explicitly per shape: redact or
accept. Document the decision.
- OUTBOUND request.body and headers: Authorization header (always
redact), customer email if PII-restricted, payment details (PCI
scope concerns). The `headers_safe` and `body_safe` field names
above are deliberate — the helper must strip known-sensitive
fields by default.
═══ PHASE 2 — VERIFY before shipping ═══
- Inbound audit emits on a successful write — verify in the log
backend within seconds.
- Inbound audit emits on a THROWN write — wrap a test handler in
a forced exception, confirm the audit still landed.
- Outbound audit emits ONE request+response pair per attempt,
including retries — force a transient failure, confirm 2-3 pairs
sharing one correlation_id appear.
- Outbound audit emits on a network error too (no response
received) — kill the network, confirm a phase=response line with
`error` set still lands.
- Recovery runbook is end-to-end testable for both directions: pick
one audit from the last 24h and complete the lookup / replay
dry-run.
- Count-parity canary runs cleanly with no false positives that
drown the signal.
- Log group retention on every audit-emitting service is ≥ 14 days.
Ship as 3 PRs: (a) emitter module + tests, (b) inbound chokepoint
wiring + inbound recovery runbook, (c) outbound HTTP wrapper +
outbound runbook + count-parity canary.
Swap the framework, the queue, the SDK, the logging backend, the database. The structure does the work.
Notice What the Prompt Is Doing
- Both directions in one module. Inbound and outbound share enough shape — greppable message, version field, correlation key, full payload, origin metadata — that splitting into two emitters with two log lines per attempt is the simplest design. The query interface is unified: one
message="boundary_audit"filter finds everything. - Per-attempt audits, not per-call. Outbound retries each produce their own request+response pair sharing a correlation_id. A “one log line per call” design would hide the fact that attempt 2 succeeded after attempt 1 errored — exactly the case where you need the full chain to reason about charges, duplicates, or partial failures.
- Idempotency-Key header propagation. Where the 3rd party supports it (Stripe, Square, others), reusing the audit’s correlation_id as the outgoing
Idempotency-Keyheader gives you server-side retry dedup at the same time as audit logging. Two birds, one stone. - Recovery has TWO runbooks. Inbound replay (rebuild missing rows from audits) and outbound lookup (answer “did we call X” from audits) are different operator workflows. Both need to exist on day one; both are short.
- The canary turns reactive into proactive. Inbound: audit count vs DB count. Outbound: audit count vs 3rd-party dashboard count. Wired into the daily ops check, the audit log surfaces silent loss within 24h instead of weeks.
What Actually Bit Us (Real Gotchas)
Four findings worth knowing before you implement.
Emit BEFORE the inbound write, AROUND every outbound attempt.
The case you need to recover from is the call that didn’t complete cleanly — the upsert that raised, the HTTP request that timed out, the retry that succeeded after the previous attempt errored. An after-only emitter on either side logs only the easy cases. Inbound: emit before the write, full stop. Outbound: emit a “request” line before each attempt AND a “response” line after, even if the response is a network error with no status code.
Lesson: the audit is a record of intent and outcome at every attempt, not a record of attempts-that-eventually-succeeded.
Key on natural identifiers, never on synthetic IDs you mint at insert time.
Inbound: our idempotency key was s3_key — the S3 object key the source Lambda was about to write. Deterministic from the source. Same inbound payload → same key, always. If we had keyed on the destination row’s UUID id (auto-generated by Postgres at insert time), every replay would have produced a fresh UUID and dedup would have been useless.
Same principle outbound: correlation_id should be derivable from the business context (subscription id, order id) so support can reconstruct it from your DB even if the call failed and never created a row.
Lesson: the idempotency / correlation key has to be a property of the thing being communicated, not of the row you’d create from it. Stripe event ids, S3 object keys, email Message-Ids, subscription ids — never the surrogate primary key of your destination table.
One outbound audit per ATTEMPT, not per call.
Most retry libraries (axios-retry, fetch-retry, native loops) wrap the entire retried call in a single “logical call” abstraction. If your audit hooks into that abstraction at the top level, you get ONE audit line for a call that actually fired three HTTP requests across the network — and you lose the ability to reason about whether the second attempt succeeded silently before the third was issued.
This is the exact pattern that double-charges customers on Stripe: the first call timed out at the client, the request actually completed server-side, the retry got a 409 conflict, the code reported failure, the customer disputes the charge a week later. The audit needs to land at the lowest level — the HTTP transport, the individual SDK request hook — so every wire-level attempt is its own row in the log.
Lesson: instrument the transport, not the orchestrator. One audit pair per network attempt, sharing a correlation_id with its siblings.
Instrument the transport, not the orchestrator. One audit pair per network attempt, sharing a correlation_id with its siblings.
The count-parity canary is what turns the audit log from forensic into detective.
Today’s inbound incident was loud — error logs were screaming 42P10. We knew something was wrong; the audit helped us measure the loss precisely. But the failure modes the audit log is most valuable against are the silent ones: an upsert that fails-soft, a queue handler that catches and drops, an outbound call that returns 200 but the email never lands.
Without a daily check comparing audit count to row count (inbound) or audit count to 3rd-party dashboard count (outbound), the audit log only ever helps incidents you already noticed. With it, the canary catches the ones you didn’t. We had not wired the canary yet at the time of today’s outage; it’s the first thing on the follow-up list.
Lesson: an audit log without a canary is a forensic tool. An audit log with a canary is a detection tool.
An audit log without a canary is a forensic tool. An audit log with a canary is a detection tool.
What This Costs
Half a day of an engineer’s attention. The emitter is one file. The inbound wiring is one line at each insert site (or zero if your inbound routes share a wrapper). The outbound wiring is a thin wrap around your HTTP client (if you have one) or a small consolidation pass (if you don’t, which is itself overdue work). The runbooks are two pages of bash and SQL. The count-parity canary is two metrics added to whatever daily check already exists.
The payback is asymmetric and recurring. The day you don’t need it, you don’t notice it. The day you need it — when a customer disputes a charge, when support asks “did we send that,” when a deploy ships a regression — the audit pays for the entire implementation in the first hour. Each “did this go through” question becomes answerable in five minutes instead of “let me check and get back to you.”
The Broader Point
Most observability budgets get spent on what happens inside a request — APM traces, error reporting, structured logs of business events. The hole this misses is what happened at the system boundary — what the 3rd party sent us before our code dropped it, what we sent the 3rd party before the network swallowed our response. Trace tools don’t cover these because the call errored out before the trace landed. Error reporters know there was a 500 but not what the body said. Structured business-event logs cover successes.
Audit logging fills the boundary hole. It’s a small commitment with an outsized recovery upside, and it’s exactly the kind of cross-cutting infrastructure work an AI coding agent does well — find the chokepoints in both directions, write the wrapper, generate the two runbooks, propose the canary. The valuable judgement (which payloads carry PII you don’t want logged, what retention the log groups actually need, which 3rd parties support Idempotency-Key headers) stays with the human. The mechanical work, which is most of it, doesn’t.
The next time a customer asks whether their webhook fired last Tuesday — or whether you actually charged their card on Friday — your answer is a 30-second log query instead of an apology.
Built on PlanB, a Bubble.io backup service. Stack: AWS Lambda backup pipeline → Next.js 16 App Router webhook routes on ECS → Supabase Postgres, with outbound integrations to Stripe (billing), Postmark (email), Cloudflare (DNS), and OpenRouter (LLM calls); observability via AWS CloudWatch. The inbound version of this audit pattern surfaced one genuinely lost row out of 40 callback attempts during a 2.5-hour outage today; recovery from log query to verified re-insert took 30 minutes. Work done with Claude Code (Opus 4.7) over the course of the incident itself.