Designing Durable Event Tables (Without Crying in Prod)

Event tables rot fast when every product pod ships whatever JSON felt right that sprint. Durable design reduces rework, retro fixes, and silent metric drift.

Common Failure Modes

Principles I Stick To

  1. Stability over clever names. Prefer product_add_to_cart over cart_add.
  2. One grain per table. If you need session + user, build a derived view.
  3. Append only. Fix upstream generation, never delete rows quietly.
  4. Version intentionally. If structure meaningfully changes, create event_v2.

Schema Skeleton

CREATE TABLE events (
  event_id        BIGINT,       -- platform generated
  user_id         BIGINT,
  session_id      BIGINT,
  event_name      TEXT,
  event_time      TIMESTAMP,
  src_ts          TIMESTAMP,    -- raw ingestion time
  received_at     TIMESTAMP,    -- warehouse processed time
  context_json    VARIANT,      -- original payload
  app_version     TEXT,
  platform        TEXT,
  country_code    TEXT
) CLUSTER BY (event_time);

Patterns That Help

1. Late Data Bucketing: Maintain a rolling fix-up job that recomputes last 3 days aggregates only.
2. Replay Safety: Use a primary_key(event_id) or hash of stable fields to dedupe idempotently.
3. Soft Version Flag: Add schema_version column even before you think you need it.
4. Context Decoding Views: Create events_flat_v1 with selected JSON fields for analysts— guardrails reduce regex spelunking.

Data Quality Triggers

SELECT event_name,
  COUNT(*) AS events,
  COUNT(DISTINCT user_id) AS users,
  SUM(CASE WHEN app_version IS NULL THEN 1 ELSE 0 END) AS missing_app_version
FROM events
WHERE event_time >= CURRENT_DATE - 1
GROUP BY 1
ORDER BY events DESC;

Evolving Safely

If you must change semantics (e.g., checkout_completed now fires only on payment success), freeze v1 and start emitting v2. Run a comparison report for 30 days so downstream jobs can migrate cleanly.

Durable event tables are boring by design. Boring scales.

Takeaways

Build once, extend sanely, sleep better.