<--- Back to all resources

Engineering

February 25, 2026

12 min read

The Outbox Pattern Explained: Reliable Event Publishing for Microservices

Learn how the transactional outbox pattern solves the dual-write problem in microservices, how it integrates with Change Data Capture, and how to implement it reliably.

TL;DR: • The dual-write problem causes data loss and inconsistency when services write to a database and a message broker in the same operation. • The outbox pattern solves this by writing events to a database table atomically with the business data, then streaming those events out separately. • Change Data Capture is the most operationally clean way to implement the outbox tail - it eliminates polling, reduces latency, and decouples the publishing concern entirely. • Managed CDC platforms like Streamkap remove the infrastructure burden of running Debezium or custom connectors yourself.

Every microservices architecture eventually runs into the same problem: how do you keep a database update and a message broker publish in sync when the two operations cannot participate in a single atomic transaction? This is called the dual-write problem, and it is the source of some of the most insidious bugs in distributed systems - the kind that only surface under failure conditions, in production, at the worst possible time.

The transactional outbox pattern is the industry-standard solution. This guide explains what it is, how it works, how Change Data Capture fits in, and how to evaluate implementation approaches for your team.

The Dual-Write Problem

Imagine an e-commerce service that processes an order. Two things need to happen:

  1. The order is saved to the database with status CONFIRMED.
  2. An OrderConfirmed event is published to Kafka so the inventory service, the notification service, and the analytics pipeline can react.

The naive implementation does both in sequence:

db.save(order)           # Step 1: write to database
kafka.publish(event)     # Step 2: publish to Kafka

This looks fine, but consider what happens when things go wrong:

  • Database succeeds, Kafka fails: The order is saved. The event is never published. Downstream services never know the order exists. Inventory is never reserved. The customer gets no confirmation email.
  • Kafka succeeds, database fails: The event is published. Downstream services start processing an order that does not exist in the database. You have phantom inventory reservations and phantom emails.
  • Network partition between the two writes: Either scenario above, depending on where the partition lands.

Both failure modes are silent. No exception is raised at the application level because from the application’s perspective, it did its best. The problem is that there is no atomic transaction that can span a relational database and a message broker.

What the Outbox Pattern Does

The outbox pattern eliminates the dual-write problem by turning the event publish into a database write. Instead of publishing directly to Kafka, the service writes the event to an outbox table in the same database - inside the same database transaction as the business data:

BEGIN;

INSERT INTO orders (id, status, ...) VALUES (...);

INSERT INTO outbox (id, aggregate_type, aggregate_id, event_type, payload, created_at)
VALUES (gen_random_uuid(), 'Order', order_id, 'OrderConfirmed', '{"orderId": ...}', NOW());

COMMIT;

Either both rows land in the database or neither does. The database transaction gives you atomicity across both writes. The “publish to Kafka” step is now a separate concern handled by an outbox processor, which reads from the outbox table and publishes to the broker.

This separation of concerns is the key insight: the application is only responsible for writing consistently to its own database. The event publishing infrastructure handles the rest.

Outbox Table Design

A minimal outbox table looks like this:

CREATE TABLE outbox (
  id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  aggregate_type VARCHAR(100) NOT NULL,   -- e.g. 'Order', 'Payment'
  aggregate_id   VARCHAR(100) NOT NULL,   -- e.g. the order ID
  event_type     VARCHAR(100) NOT NULL,   -- e.g. 'OrderConfirmed'
  payload        JSONB NOT NULL,          -- the event body
  created_at     TIMESTAMPTZ DEFAULT NOW(),
  processed_at   TIMESTAMPTZ             -- NULL until consumed
);

aggregate_type and aggregate_id together let you route events to the correct Kafka topic. event_type lets consumers determine how to handle the event. processed_at is used by polling-based implementations to track what has been consumed.

For high-throughput systems, add an index on (processed_at, created_at) to make polling efficient. You will also want a cleanup job to delete or archive old processed rows so the table does not grow unbounded.

How to Process the Outbox

Writing to the outbox table is only half the pattern. You need a mechanism to read those rows and publish them to Kafka. There are two main approaches.

Approach 1: Polling

A background job (a separate thread, a scheduled task, or a sidecar process) periodically queries for unprocessed rows:

SELECT * FROM outbox WHERE processed_at IS NULL ORDER BY created_at LIMIT 100;

It publishes each event to Kafka, then marks the row as processed:

UPDATE outbox SET processed_at = NOW() WHERE id = ANY($1);

Polling is simple to implement and easy to understand. The drawbacks are latency (your poll interval sets a floor on event delivery delay), database load (constant queries even when there is nothing to process), and the need to handle exactly-once semantics carefully - if the publisher crashes after publishing but before marking the row as processed, the event will be published again on the next poll.

Approach 2: Change Data Capture

Change Data Capture (CDC) is a technique for reading a database’s binary replication log - the internal log that the database uses to support replication. Every INSERT, UPDATE, and DELETE that commits to the database is recorded in this log. A CDC tool reads the log and streams change events downstream.

When you use CDC to tail the outbox table, you get:

  • Near-zero latency: Events appear in Kafka within milliseconds of the database transaction committing, not on the next poll cycle.
  • Zero additional database load: Reading the replication log is a lightweight operation that does not require executing queries against the database.
  • No polling logic to write or maintain: The CDC tool handles all of it.
  • Natural ordering: The replication log preserves the exact order in which transactions committed.

The most widely used open-source CDC tool for this pattern is Debezium, which supports PostgreSQL (via logical replication), MySQL (via binlog), and several others. Debezium runs as a Kafka Connect connector and emits change events to Kafka topics.

The downside of CDC-based outbox processing is operational complexity: you need to deploy and manage Debezium, configure logical replication on your database, monitor lag, handle connector restarts, and deal with schema evolution. This is non-trivial infrastructure.

Outbox Pattern + CDC: The Standard Architecture

The combination of the outbox pattern and CDC has become the standard architecture for reliable event publishing in serious microservices deployments. The full picture looks like this:

Application


Database Transaction ──► orders table
                    └──► outbox table


                        CDC (reads replication log)


                           Kafka

                    ┌─────────┴─────────┐
                    ▼                   ▼
            Inventory Service    Notification Service

Each downstream service is independent and reacts to the event at its own pace. If the inventory service is down, its Kafka consumer group simply falls behind and catches up when it recovers - no data is lost. The application’s database is the single source of truth, and the outbox is the durable bridge between that truth and the event stream.

Debezium’s Outbox Event Router

Debezium has built-in support for the outbox pattern via its Outbox Event Router SMT (Single Message Transform). When enabled, this transform reads rows from the outbox table and:

  1. Routes each event to a Kafka topic based on aggregate_type (e.g., events with aggregate_type = 'Order' go to outbox.Order)
  2. Sets the Kafka message key to aggregate_id (ensuring ordering per entity)
  3. Sets the Kafka message value to payload
  4. Optionally deletes the row after publication (instead of using a processed_at marker)

This means you do not need to build any custom routing or serialization logic. You write the event to the outbox table in the correct format, and the router handles the rest.

Implementation Pitfalls

Event Schema Evolution

Your outbox payload is JSON stored in a database column. If consumers depend on specific fields in that JSON, schema changes can break them. Treat your outbox event schema the same way you would treat an API contract: version it, communicate changes, and use forward-compatible schemas (add fields, never remove or rename them without a migration plan).

Large Payloads

Avoid putting large blobs in the outbox payload. If an order confirmation needs to include a 5MB PDF, store the PDF in object storage and put a reference URL in the event. Kafka has a default message size limit, and large payloads increase replication lag and consumer memory pressure.

Ordering Guarantees

CDC preserves the order in which rows were inserted into the outbox. Kafka preserves order within a partition. Since Debezium uses aggregate_id as the message key, all events for the same entity (e.g., the same order) land on the same Kafka partition, preserving per-entity ordering. Cross-entity ordering is not guaranteed, which is correct behavior - two different orders can be processed in parallel.

Cleanup

The outbox table will accumulate rows indefinitely if you do not clean up. Add a scheduled job that deletes rows older than N days (or after they have been processed). Index the created_at or processed_at column to make this efficient.

When to Use the Outbox Pattern

The outbox pattern is the right choice when:

  • You cannot tolerate event loss: Financial transactions, order processing, anything where a missing event has real-world consequences.
  • You need strong consistency between your database and your event stream: The state in your database and the events in Kafka must agree.
  • You are operating at scale: High throughput scenarios where polling becomes a bottleneck.
  • You have regulatory requirements: Audit trails and compliance requirements often mandate that every state change is durably recorded.

It is probably overkill when you are building a small internal service, when events are purely informational (losing one is annoying but not catastrophic), or when your organization does not yet have the infrastructure to run CDC reliably.

Managed CDC with Streamkap

Running Debezium yourself is a significant operational investment. You need to configure logical replication slots on your database, manage connector restart policies, monitor replication lag, handle offset storage, and deal with connector upgrades. For teams that want the reliability of CDC-based outbox processing without the infrastructure overhead, managed CDC platforms provide a turnkey alternative.

Streamkap connects directly to your PostgreSQL, MySQL, or other database’s replication log and streams change events to Kafka without you needing to deploy or manage any connector infrastructure. The outbox table tail becomes a configuration rather than an engineering project. You define which tables to capture, where to send events, and the platform handles the rest - including schema evolution, restarts, and lag monitoring.

This is particularly valuable for teams that want to adopt the outbox pattern quickly or that are running the pattern across multiple services, each with its own database.

Summary

The outbox pattern is one of the most important reliability patterns in distributed systems. By making event publication a database concern rather than an application concern, it eliminates the dual-write problem and gives you durable, ordered, at-least-once event delivery.

CDC is the cleanest way to implement the outbox tail: it is lower latency than polling, has less database overhead, and requires no polling logic in your application. The trade-off is operational complexity, which managed platforms can absorb.

If your microservices need to publish events reliably - and most production systems do - the outbox pattern with CDC is the architecture worth understanding and adopting.