What is the lowest-latency CDC delivery pattern?

Direct streaming with a destination-native ingestion API - for example, CDC directly to ClickHouse using its native row-level protocol, or CDC to Elasticsearch via the bulk API. Sub-second end-to-end latency is achievable. Hub-and-spoke adds Kafka broker latency (typically 10–100ms) but enables fan-out to multiple destinations. Transform-in-flight adds processing latency on top of either pattern.

When should I use Kafka as an intermediate layer versus going directly to the destination?

Use Kafka as an intermediate layer when you have more than one consumer of the CDC stream (fan-out), when downstream consumers need different schemas or formats, when you need replayability beyond what the destination provides, or when consumers have different SLAs. Go direct when you have one destination, want minimum latency, and the destination's ingestion API meets your throughput requirements.

How does Snowflake Snowpipe Streaming differ from regular Snowpipe?

Standard Snowpipe loads data from staged files in cloud object storage - it is micro-batch at best, with latency in the minutes range. Snowpipe Streaming uses a channel-based SDK that writes rows directly to Snowflake's table buffer, delivering second-range latency without the staging step. For CDC pipelines targeting Snowflake, Snowpipe Streaming is the correct API to use.

How do I handle schema changes (DDL events) in a CDC pipeline?

Schema changes - ALTER TABLE, column additions, renames - are emitted as DDL events in the CDC stream alongside DML events. Destinations handle them differently: Snowflake and BigQuery support schema evolution natively. ClickHouse requires ALTER TABLE DDL to be executed against the target before new columns can be written. Elasticsearch mappings must be updated before indexing new fields. A reliable CDC platform will propagate schema changes automatically and pause the affected pipeline if the destination cannot accept the new schema.

<--- Back to all resources

Engineering

February 25, 2026

14 min read

CDC to Destination: Architecture Patterns for Every Target

A complete reference for CDC delivery architecture patterns - direct streaming, hub-and-spoke, and transform-in-flight - with destination-specific guidance for Snowflake, BigQuery, ClickHouse, and more.

TL;DR: • Direct streaming (CDC to destination) minimizes latency and operational overhead for single-destination pipelines. • Hub-and-spoke (CDC to Kafka to multiple destinations) is the standard pattern for data platform teams serving many consumers. • Transform-in-flight (CDC to Flink/processing to destination) is required when data must be reshaped, aggregated, or enriched before landing. • Each destination - Snowflake, BigQuery, Databricks, ClickHouse, Elasticsearch - has a preferred ingestion API that determines achievable latency and throughput.

Change data capture solves the extraction problem: it captures every row-level change from a source database as a stream of events. But extraction is only half the pipeline. The architectural decisions about how those events travel to their destinations - and what happens to them along the way - determine latency, reliability, schema flexibility, and operational complexity.

This guide is a complete reference for CDC delivery architecture. It covers the three foundational patterns, the tradeoffs between them, and concrete guidance for every major analytical destination.

The Three Foundational Patterns

Pattern 1: Direct Streaming (CDC → Destination)

Source DB ──[CDC]──► Destination
(PostgreSQL)         (ClickHouse / Elasticsearch / Redis)

Direct streaming is the simplest topology: CDC events flow from source to a single destination with no intermediate broker. The CDC connector reads the database transaction log and writes directly to the destination using that destination’s native ingestion protocol.

Characteristics:

Lowest latency (sub-second achievable)
Simplest operational model (one pipeline to monitor)
No fan-out: only works for a single destination
Destination availability becomes a hard dependency - backpressure or downtime propagates to the source

Best for: Real-time operational use cases with a single target - search indexes, operational caches, feature stores, real-time dashboards. If ClickHouse is your only target and you want the lowest possible latency, go direct.

Not suitable for: Multi-destination delivery, complex transformations, or scenarios where different consumers need different schemas.

Pattern 2: Hub-and-Spoke (CDC → Kafka → Multiple Destinations)

Source DB ──[CDC]──► Kafka ──► Destination A (Snowflake)
                         │──► Destination B (BigQuery)
                         │──► Destination C (Elasticsearch)
                         └──► Destination D (ClickHouse)

Kafka acts as the durable, replayable hub. The CDC connector writes events to Kafka topics; downstream sink connectors (Kafka Connect sinks, custom consumers, or managed connectors) consume those topics and write to each destination independently.

Characteristics:

Fan-out: one source feeds any number of destinations
Destinations are decoupled - a slow or unavailable destination does not block others
Kafka provides replayability: rewind and re-deliver to a destination without re-reading the source DB
Operational overhead: Kafka cluster, consumer groups, lag monitoring, schema registry
Each hop adds latency (Kafka write + consumer lag)

Best for: Data platform teams serving multiple analytical and operational consumers from the same source system. The standard enterprise pattern.

Kafka topic design for CDC:

Concern	Recommendation
Topic naming	`{env}.{db}.{schema}.{table}` (e.g., `prod.orders.public.customers`)
Partitioning	Partition by primary key to preserve per-row ordering
Retention	At least 7 days; longer for disaster recovery
Compaction	Enable log compaction for event-sourcing and state rebuild use cases
Schema	Avro or Protobuf with a schema registry for consumer compatibility

Pattern 3: Transform-in-Flight (CDC → Processing → Destination)

Source DB ──[CDC]──► Kafka ──► Flink / Processing ──► Destination

When CDC events need to be transformed, enriched, aggregated, or filtered before landing at the destination, a stateful stream processor sits between Kafka and the destination. Apache Flink is the standard choice for this tier.

Characteristics:

Full transformation capability: filter, aggregate, join, enrich, reshape
Can materialize pre-aggregated views for analytical destinations
Adds processing latency on top of the hub-and-spoke latency
Higher operational complexity: Flink cluster, checkpointing, state management
Enables exactly-once delivery semantics end-to-end (when using Flink’s transactional sink)

Best for: Scenarios requiring pre-aggregation (Snowflake or BigQuery summary tables), cross-stream joins (enrich CDC events before landing), sensitive data masking, or schema transformation.

When not to use it: If your destination (Snowflake, BigQuery) has sufficient compute to do transformations at query time, adding a Flink layer purely for transformation may be unnecessary complexity. Prefer transform-in-flight when the transformation reduces volume significantly (aggregation) or when data must be shaped differently for different consumers from the same source topic.

Hybrid: Multi-Tier Topologies

Real-world architectures combine patterns. A common enterprise topology:

Source DBs (PostgreSQL, MySQL, Oracle)
        │
        │ CDC
        ▼
Kafka (hub)
    ├──► Direct stream to Elasticsearch (search, sub-second latency required)
    ├──► Direct stream to Redis (feature store, millisecond latency required)
    └──► Flink (transform-in-flight)
              ├──► Snowflake Snowpipe Streaming (enriched analytics)
              └──► BigQuery Storage Write API (ML training datasets)

This topology uses direct streaming for latency-sensitive destinations and hub-and-spoke with transform-in-flight for analytical destinations that benefit from enrichment.

Destination-Specific Guidance

Snowflake

Preferred ingestion API: Snowpipe Streaming (channel-based SDK)

Snowpipe Streaming writes rows directly to Snowflake’s table buffer, bypassing the cloud storage staging step required by standard Snowpipe. This reduces end-to-end latency from minutes to seconds.

Architecture:

CDC → Kafka → Snowflake Kafka Connector (Snowpipe Streaming mode)
                      │
                      ▼
              Snowflake Table Buffer
                      │ (seconds, not minutes)
                      ▼
              Snowflake Table (queryable)

Latency characteristics:

Standard Snowpipe (file-based): 1–5 minutes
Snowpipe Streaming: 5–30 seconds
Direct row insert via JDBC: seconds (but does not scale to high throughput)

Schema evolution: Snowflake supports COLUMN ADD without downtime. The Kafka connector can be configured to automatically add new columns when CDC events contain new fields. DROP COLUMN and type changes require manual intervention.

Deduplication: Snowpipe Streaming provides at-least-once delivery. Deduplicate in Snowflake using MERGE statements or stream-based deduplication in dbt. Include the CDC event’s _change_seq_num or lsn (log sequence number) as a column to detect and discard duplicates.

Recommended table design for CDC targets:

CREATE TABLE customers_cdc (
    customer_id     VARCHAR,
    email           VARCHAR,
    tier            VARCHAR,
    _cdc_op         VARCHAR,    -- 'c' (create), 'u' (update), 'd' (delete)
    _cdc_lsn        BIGINT,     -- source log sequence number
    _cdc_timestamp  TIMESTAMP,  -- event timestamp from source
    _loaded_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

BigQuery

Preferred ingestion API: Storage Write API (committed mode)

The BigQuery Storage Write API provides exactly-once semantics and second-range latency. The legacy streaming insert API is now deprecated for new pipelines. The committed write mode creates a writer stream, appends rows, and flushes on commit - similar in concept to Snowpipe Streaming.

Architecture:

CDC → Kafka → BigQuery Kafka Connector (Storage Write API)
                      │
                      ▼
              BigQuery Committed Write Stream
                      │ (seconds)
                      ▼
              BigQuery Table (immediately queryable)

Latency characteristics:

Legacy streaming insert: seconds (deprecated, higher cost)
Storage Write API (committed): 5–15 seconds
Storage Write API (pending mode): batch-like, lower cost, higher latency

Partitioning strategy: Partition BigQuery tables by _cdc_timestamp or DATE(_cdc_timestamp) to improve query performance and control retention costs. Cluster by the primary key column for fast point lookups.

Schema evolution: BigQuery supports adding nullable columns automatically. Use the connector’s schema update option to propagate new columns from CDC events. Non-nullable columns and type changes require migration scripts.

Databricks Delta Lake

Preferred ingestion API: Delta Lake auto loader + structured streaming, or Delta Kafka connector

Delta Lake on Databricks supports ACID transactions, schema evolution, and time travel - making it well-suited for CDC targets that need historical queryability.

Two ingestion approaches:

Option A: Kafka → Delta (streaming write)

# Databricks Structured Streaming
spark.readStream \
    .format("kafka") \
    .option("kafka.bootstrap.servers", "broker:9092") \
    .option("subscribe", "prod.orders.public.customers") \
    .load() \
    .writeStream \
    .format("delta") \
    .option("checkpointLocation", "/delta/checkpoints/customers") \
    .toTable("customers_raw")

Option B: Apply CDC as MERGE (upsert)

from delta.tables import DeltaTable

def upsert_to_delta(batch_df, batch_id):
    target = DeltaTable.forName(spark, "customers")
    target.alias("t").merge(
        batch_df.alias("s"),
        "t.customer_id = s.customer_id"
    ).whenMatchedUpdateAll() \
     .whenNotMatchedInsertAll() \
     .whenMatchedDelete(condition="s._cdc_op = 'd'") \
     .execute()

Apache Iceberg: The pattern is identical to Delta Lake. Iceberg’s MERGE INTO support and its open format (readable by Snowflake, BigQuery, and Trino without copying data) make it increasingly preferred for multi-engine analytical platforms.

ClickHouse

Preferred ingestion API: Native protocol / HTTP bulk insert

ClickHouse is purpose-built for high-throughput analytical ingestion and achieves sub-second query latency at scale. CDC delivery to ClickHouse requires careful attention to its table engine and deduplication model.

Recommended table engine for CDC targets: ReplacingMergeTree

CREATE TABLE customers
(
    customer_id    UInt64,
    email          String,
    tier           String,
    _cdc_version   UInt64,    -- use LSN or Unix timestamp millis
    _is_deleted    UInt8 DEFAULT 0,
    _updated_at    DateTime
)
ENGINE = ReplacingMergeTree(_cdc_version)
ORDER BY (customer_id);

ReplacingMergeTree deduplicates rows with the same ORDER BY key during background merges, keeping the row with the highest version number. This handles CDC upserts naturally.

Handling deletes: ClickHouse does not natively soft-delete via ReplacingMergeTree. The standard pattern is to set _is_deleted = 1 in the latest version of a row and filter it out in queries:

SELECT * FROM customers FINAL WHERE _is_deleted = 0;

The FINAL keyword forces deduplication at query time rather than waiting for background merges.

Latency: ClickHouse with direct streaming (no Kafka intermediate) achieves end-to-end latency under 1 second for most workloads. Batch size tuning matters: larger batches (thousands of rows) are more efficient than row-by-row inserts.

Elasticsearch / OpenSearch

Preferred ingestion API: Bulk API via Kafka Connect Elasticsearch connector

Elasticsearch is the standard target for full-text search and log analytics CDC use cases. The delivery pattern is typically hub-and-spoke with the Elasticsearch Kafka connector.

Index design for CDC targets:

{
  "mappings": {
    "properties": {
      "customer_id":    { "type": "keyword" },
      "email":          { "type": "keyword" },
      "name":           { "type": "text" },
      "tier":           { "type": "keyword" },
      "_cdc_timestamp": { "type": "date" },
      "_cdc_op":        { "type": "keyword" }
    }
  }
}

Handling deletes: The Elasticsearch connector supports tombstone record handling - a CDC delete event (with null value) triggers a document delete in the index. Configure the connector’s behavior.on.null.values to delete.

Schema evolution: Elasticsearch uses dynamic mapping by default, which will infer types for new fields. For production, use explicit mappings and update them before new fields appear in CDC events to avoid mapping conflicts.

Latency and Throughput Comparison

Pattern	Typical Latency	Max Throughput	Operational Complexity
Direct: CDC → ClickHouse	< 1 second	Very high	Low
Direct: CDC → Elasticsearch	1–3 seconds	High	Low
Hub-and-spoke: → Snowflake (Snowpipe Streaming)	5–30 seconds	High	Medium
Hub-and-spoke: → BigQuery (Storage Write API)	5–15 seconds	High	Medium
Hub-and-spoke: → Delta Lake (streaming)	10–60 seconds	High	Medium
Transform-in-flight: Flink → any destination	+5–30 seconds over base	High	High

Exactly-Once vs. At-Least-Once Delivery

CDC pipelines almost universally provide at-least-once delivery: in the event of a failure and restart, some events may be redelivered. For idempotent destinations (ClickHouse ReplacingMergeTree, Elasticsearch document IDs, Snowflake MERGE), at-least-once is sufficient - duplicates are handled by the destination’s deduplication mechanism.

Exactly-once delivery requires transactional coordination between the source, the broker, and the destination. Flink provides end-to-end exactly-once when paired with a transactional Kafka source and a transactional sink (Snowflake, Iceberg). This eliminates the need for destination-side deduplication but adds latency (transactions commit in checkpoint intervals, typically 1–30 seconds) and operational complexity.

For most CDC delivery use cases, at-least-once with idempotent destination writes is the pragmatic choice.

Schema Change Handling Across Destinations

Schema changes - column additions, renames, type changes - are the most operationally sensitive aspect of CDC pipelines. A reliable CDC platform like Streamkap propagates schema changes automatically, pausing affected pipelines if a destination cannot accept the new schema and alerting operators.

Destination	Column Add	Column Drop	Type Change
Snowflake	Automatic	Manual	Manual
BigQuery	Automatic (nullable)	Manual	Manual
ClickHouse	Requires ALTER TABLE	Requires ALTER TABLE	Manual
Elasticsearch	Auto (dynamic mapping)	Index rebuild	Index rebuild
Delta Lake	Automatic (schema evolution)	Manual	Manual

Architecture Decision Guide

When choosing a CDC delivery pattern, answer these questions in order:

How many destinations? One → consider direct streaming. More than one → hub-and-spoke.
What latency is required? Sub-second → direct or direct-to-ClickHouse/Elasticsearch. Seconds → Snowpipe Streaming / Storage Write API. Minutes → any pattern.
Do events need transformation before landing? Yes → add transform-in-flight tier. No → skip Flink complexity.
Do consumers need different schemas? Yes → Kafka + per-consumer transformation. No → single sink.
Is replay required? Yes → route through Kafka for configurable retention. No → direct is fine.

There is no universally correct pattern. Direct streaming minimizes complexity and latency. Hub-and-spoke maximizes flexibility. Transform-in-flight enables data product architecture. The right answer depends on how many consumers you serve, what SLAs they require, and how much operational overhead your team can sustain.

Platform

Capabilities

Streamkap for...

Use Cases

By Destination

Compare

Learn

Company

CDC to Destination: Architecture Patterns for Every Target

The Three Foundational Patterns

Pattern 1: Direct Streaming (CDC → Destination)

Pattern 2: Hub-and-Spoke (CDC → Kafka → Multiple Destinations)

Pattern 3: Transform-in-Flight (CDC → Processing → Destination)

Hybrid: Multi-Tier Topologies

Destination-Specific Guidance

Snowflake

BigQuery

Databricks Delta Lake

ClickHouse

Elasticsearch / OpenSearch

Latency and Throughput Comparison

Exactly-Once vs. At-Least-Once Delivery

Schema Change Handling Across Destinations

Architecture Decision Guide