<--- Back to all resources

Engineering

February 25, 2026

12 min read

Event Sourcing with CDC: Deriving Events from Database State

How CDC bridges the gap between traditional CRUD databases and event sourcing patterns. Learn to retrofit event streams onto existing systems without rewriting your application.

TL;DR: • Event sourcing stores state as a sequence of events rather than current values. • Most existing applications use CRUD databases, making a full event sourcing rewrite impractical. • CDC bridges this gap by capturing database changes as events without modifying the application. • This gives you event-sourced downstream consumers while keeping your existing database-backed services unchanged.

If you have spent any time building distributed systems, you have probably heard the pitch for event sourcing: store every state change as an immutable event, replay those events to reconstruct state, and gain a perfect audit trail of everything that ever happened in your system.

It is a compelling idea. It is also a massive undertaking if you already have a running application backed by a traditional relational database. Rewriting an existing service to use event sourcing from scratch is rarely practical, and it introduces risk into systems that are already working.

But what if you could get most of the benefits of event sourcing without touching your application code? That is exactly where change data capture (CDC) comes in.

Event Sourcing in 60 Seconds

In a traditional CRUD system, your database holds the current state. When a customer updates their shipping address, the old address is overwritten. The previous value is gone unless you built something to preserve it.

Event sourcing takes the opposite approach. Instead of storing current state, you store the sequence of events that produced it:

CustomerCreated { id: 42, name: "Alice", address: "123 Main St" }
AddressUpdated  { id: 42, address: "456 Oak Ave" }
AddressUpdated  { id: 42, address: "789 Pine Rd" }

To get the current state, you replay the events in order. The event log is the source of truth, and the current state is a derived projection.

This gives you a few things that CRUD databases do not:

  • Full history - Every change is preserved. You can answer “what did this record look like at 3pm last Tuesday?”
  • Audit trail - The event log is a natural, immutable audit record.
  • Temporal queries - You can reconstruct state at any point in time.
  • Multiple projections - Different consumers can build different views from the same event stream.

These are real advantages. They are also the reason event sourcing has become so popular in domains like finance, healthcare, and e-commerce where the history of changes matters as much as the current state.

Why Full Rewrites Rarely Happen

If event sourcing is so useful, why does not everyone adopt it? The answer is straightforward: most applications were not designed for it, and retrofitting event sourcing into an existing system is one of the hardest migrations you can attempt.

Consider what a full event sourcing rewrite actually requires:

  1. Redesign your data model - Replace tables and rows with event streams and projections. This is not a schema migration. It is a fundamental change in how you think about data.
  2. Rewrite your application layer - Every service that reads or writes data needs to work with events instead of direct database queries.
  3. Handle eventual consistency - Event-sourced systems are inherently eventually consistent. Your UI, your APIs, and your business logic all need to account for this.
  4. Migrate existing data - You need to generate synthetic “seed” events for all existing records, or run old and new systems in parallel during a cutover period.
  5. Retrain your team - Event sourcing requires a different mental model. Debugging, testing, and operational runbooks all change.

For a greenfield project, this is manageable. For a production system serving real traffic? The risk and cost are difficult to justify, especially when most of the value comes from downstream event processing rather than the event store itself.

CDC as a Pragmatic Bridge

Change data capture sidesteps the rewrite problem entirely. Instead of changing how your application writes data, CDC reads the database’s own change log - the write-ahead log (WAL) in PostgreSQL, the binlog in MySQL, or the oplog in MongoDB - and turns those low-level changes into a stream of events.

Your application keeps writing to the database exactly as it does today. No code changes. No new data model. No eventual consistency headaches in your primary service. But downstream, you now have an event stream that captures every insert, update, and delete.

Here is the key insight: from the perspective of a downstream consumer, a CDC stream looks a lot like an event-sourced stream. The consumer receives an ordered sequence of changes, can process them in real time, and can build whatever projections it needs.

The difference is subtle but important. In true event sourcing, the events are the primary artifact and the database state is derived. With CDC, the database state is primary and the events are derived. But for many practical use cases - populating a search index, feeding an analytics warehouse, syncing to a cache, driving notifications - the result is the same.

What CDC Events Actually Look Like

When you set up CDC on a PostgreSQL table, each captured change produces a structured event. A typical CDC event for a row update includes:

{
  "op": "u",
  "before": {
    "id": 42,
    "status": "pending",
    "updated_at": "2026-02-24T10:00:00Z"
  },
  "after": {
    "id": 42,
    "status": "shipped",
    "updated_at": "2026-02-25T14:30:00Z"
  },
  "source": {
    "db": "orders",
    "table": "shipments",
    "lsn": "0/1A2B3C4"
  },
  "ts_ms": 1740494400000
}

You get the operation type (c for create, u for update, d for delete), the before and after states, and metadata about the source. This is enough to reconstruct the full history of changes for any row in the table.

Implementing CDC-Based Event Sourcing

A typical architecture for CDC-based event sourcing has four layers:

1. Source Database - Your existing PostgreSQL, MySQL, or MongoDB instance. Enable logical replication (PostgreSQL) or binlog (MySQL).

2. CDC Connector - Captures changes from the database log and publishes them to a message broker. This is where a platform like Streamkap fits - it handles the connector setup, manages offsets, and routes changes into Kafka topics without you needing to operate Debezium clusters yourself.

3. Kafka - Acts as the durable event log. Each table typically maps to a topic, and events are partitioned by primary key to maintain ordering per entity.

4. Stream Processors and Consumers - Downstream services that consume the event stream. This might be a Flink job that enriches or transforms events, an Elasticsearch indexer, or a service that builds materialized views.

The result is that your source database keeps doing what it has always done, while a parallel event stream flows to every system that needs it. You get the downstream benefits of event sourcing - real-time projections, replayable history, multiple consumers - without any of the upstream rewrite cost.

Deriving Business Events from Row Changes

One of the valid criticisms of CDC compared to true event sourcing is that CDC events are structural, not semantic. A CDC event says “the status column on row 42 changed from pending to shipped.” A true event-sourced system would emit an OrderShipped event with the order ID, tracking number, and carrier.

The structural event contains the same information, but it requires the consumer to interpret what the column change means. This is where stream processing fills the gap.

With a Flink job sitting between your CDC stream and your consumers, you can transform raw CDC events into higher-level business events:

-- Flink SQL: derive business events from CDC changes
SELECT
  after.id AS order_id,
  'OrderShipped' AS event_type,
  after.tracking_number,
  after.carrier,
  event_time
FROM orders_cdc
WHERE op = 'u'
  AND before.status = 'pending'
  AND after.status = 'shipped'

This gives you the best of both worlds. The source application writes to a regular database table. CDC captures the raw changes. Flink derives meaningful business events. Downstream consumers get clean, typed events like OrderShipped, PaymentReceived, or InventoryDepleted - as if the system were event-sourced from the start.

Streamkap supports this workflow end to end. CDC captures go through managed Kafka, and you can apply Flink transformations to shape raw change events into whatever your downstream systems need.

The Outbox Pattern: A Middle Ground

Before going further, it is worth mentioning the transactional outbox pattern, since it sits between pure CDC and full event sourcing.

With the outbox pattern, your application writes to both a business table and an outbox table in the same database transaction:

BEGIN;
  UPDATE orders SET status = 'shipped' WHERE id = 42;
  INSERT INTO outbox (event_type, payload) VALUES (
    'OrderShipped',
    '{"order_id": 42, "tracking": "1Z999AA1"}'
  );
COMMIT;

A CDC connector then reads from the outbox table and publishes those events. Because both writes happen in the same transaction, you get exactly-once semantics between the state change and the event.

The outbox pattern has real advantages:

  • Business-level events - You define the event schema, not the CDC connector.
  • Transactional consistency - The event is guaranteed to exist if and only if the state change committed.
  • Explicit intent - The event carries the business meaning directly.

But it also has costs:

  • Application changes required - You need to modify every write path to also insert into the outbox table.
  • Outbox table management - The outbox table grows continuously and needs to be cleaned up.
  • Schema coupling - You now have two schemas to maintain per domain event: the database table and the outbox event format.

For teams that can modify their application code, the outbox pattern is a strong choice. For teams that cannot - or do not want to - plain CDC achieves 80% of the result with 0% of the application changes.

Practical Example: E-Commerce Order Lifecycle

Let’s walk through a concrete scenario. You run an e-commerce platform with a monolithic Django application backed by PostgreSQL. The orders table tracks order lifecycle:

id | customer_id | status    | total  | updated_at
42 | 7           | created   | 89.99  | 2026-02-25 09:00:00
42 | 7           | paid      | 89.99  | 2026-02-25 09:01:15
42 | 7           | shipped   | 89.99  | 2026-02-25 14:30:00
42 | 7           | delivered  | 89.99  | 2026-02-26 11:45:00

Without CDC, these intermediate states are overwritten. Your database only holds the latest row. If you need to know when an order was paid, you either have to query an audit log (if you built one) or search through application logs.

With CDC enabled, every one of those status transitions is captured as a separate event. A downstream analytics service can now:

  • Calculate time between order creation and payment (1 minute 15 seconds)
  • Measure fulfillment speed (5 hours 29 minutes from payment to shipment)
  • Track delivery performance (21 hours 15 minutes from shipment to delivery)
  • Build funnel analytics showing where orders stall or drop off

A notification service can trigger emails at each transition. A fraud detection service can flag orders with unusual timing patterns. An inventory system can reserve stock at creation and release it at cancellation.

All of this happens without modifying the Django application. The application just writes to PostgreSQL as it always has.

When CDC Is Enough - and When It Is Not

CDC-based event sourcing works well when:

  • Your primary goal is downstream consumption. You want to feed data warehouses, search indexes, caches, or analytics systems in real time.
  • You need an audit trail from an existing system. CDC gives you the complete history of changes without modifying the source.
  • Your team cannot or should not rewrite the source application. The source is a third-party system, a legacy codebase, or a stable service that does not need changes.
  • You want incremental adoption. You can start with CDC on one table and expand gradually.

Full event sourcing makes more sense when:

  • Business intent matters more than state changes. If the difference between “customer cancelled the order” and “admin cancelled the order” is important, and both result in the same status = cancelled update, CDC cannot distinguish them.
  • You need event-driven command processing. If your application logic should react to events as the primary flow - not just downstream systems - event sourcing is the right fit.
  • You are building a new system from scratch. Without legacy constraints, you can design for event sourcing from day one and avoid the mismatch entirely.
  • Regulatory requirements demand explicit event provenance. Some compliance frameworks require that events carry explicit business semantics, not just structural changes.

For most teams working with existing systems, CDC is the practical starting point. You can always layer in more explicit event modeling later, either through the outbox pattern or by gradually migrating services to true event sourcing where it matters most.

Getting Started

If you want to try CDC-based event sourcing on your existing database, the steps are straightforward:

  1. Enable change capture on your database. For PostgreSQL, set wal_level = logical and create a replication slot. For MySQL, enable the binlog with ROW format. For MongoDB, change streams are available on replica sets by default.

  2. Set up a CDC connector. You can run Debezium yourself, or use a managed platform like Streamkap that handles connector lifecycle, offset management, and schema evolution for you.

  3. Route events through Kafka. Each table maps to a topic. Events are keyed by primary key, so all changes to a single entity land on the same partition in order.

  4. Build your first consumer. Start simple - maybe a service that materializes a denormalized view, or a pipeline that sinks changes to your data warehouse.

  5. Add transformations as needed. Once raw CDC events are flowing, use Flink or similar stream processors to derive business-level events, filter noise, or enrich changes with data from other streams.

The entire pipeline can be running in an afternoon. That is a different proposition from a six-month event sourcing rewrite, and for many teams, it delivers the same downstream value.