What is the difference between log-based CDC and query-based CDC?

Log-based CDC reads the database's native transaction log (e.g., PostgreSQL WAL, MySQL binlog, SQL Server CDC) to capture every row-level change with near-zero impact on the source database. Query-based CDC periodically polls the source table using a timestamp or watermark column to detect changes. Log-based CDC provides lower latency, captures deletes, and imposes virtually no query load, but requires database-level configuration. Query-based CDC is simpler to set up but misses hard deletes, struggles with high-frequency tables, and adds polling overhead.

Can I use a CDC tool alongside my existing ETL pipeline?

Yes. CDC and ETL are complementary rather than mutually exclusive. A common architecture runs CDC for operational tables that need low-latency replication - feeding real-time analytics, caches, or search indexes - while batch ETL continues to handle historical loads, complex multi-source joins, and data warehouse ingestion. Over time, teams often migrate more pipelines from batch ETL to CDC as confidence grows and latency requirements tighten.

How do CDC tools handle schema changes in the source database?

Schema evolution is one of the harder problems in CDC. The best tools detect DDL events (ALTER TABLE, ADD COLUMN) from the transaction log and propagate them automatically to downstream destinations. Some tools pause the pipeline and require manual intervention; others apply schema changes transparently. Managed platforms like Streamkap handle schema evolution automatically, emitting updated schemas to downstream systems without data loss or pipeline restarts.

Is Debezium production-ready for large-scale deployments?

Debezium is widely used in production at scale, but it requires significant operational investment. You need to provision and maintain Kafka clusters, manage Kafka Connect workers, handle connector configuration, monitor consumer lag, and handle offset management. For teams with strong Kafka expertise, Debezium is a powerful and cost-effective choice. For teams that want to focus on product work rather than infrastructure, a managed platform eliminates that operational burden.

Best CDC Tools Compared: A 2026 Guide to Change Data Capture Platforms

Change Data Capture has moved from a niche replication technique to a foundational component of modern data infrastructure. Whether you are building real-time analytics, keeping microservice caches in sync, feeding AI pipelines, or migrating databases with zero downtime, the CDC tool you choose will shape the reliability, cost, and operational complexity of your entire data stack.

This guide compares the seven most widely used CDC platforms as of 2026: Debezium, Fivetran, AWS DMS, Airbyte, Streamkap, Striim, and HVR/Qlik Replicate. For each platform we cover architecture, latency, managed versus self-hosted tradeoffs, pricing model, connector breadth, and stream processing capabilities.

What to Look for in a CDC Tool

Before diving into the comparisons, it helps to define the dimensions that matter most:

Latency - How quickly do changes appear in the destination after they are committed in the source? Sub-second end-to-end latency is achievable with log-based CDC; batch-oriented tools may introduce delays of minutes or hours.

Deployment model - Self-hosted open source, cloud-managed SaaS, or cloud-provider native? Each carries different tradeoffs for operational burden, cost, and vendor lock-in.

Pricing - Per connector, per row synced, per monthly active row (MAR), or usage-based? At scale, pricing model often matters more than headline price.

Connector breadth - How many source databases and destination systems does the tool support? Does it cover your specific versions and cloud variants?

Stream processing - Can the tool filter, transform, enrich, or route data in-flight, or does raw data land in the destination unchanged?

Schema evolution - How does the tool handle ALTER TABLE and other DDL changes without breaking pipelines?

Debezium

Architecture

Debezium is an open-source CDC framework built on Apache Kafka Connect. It ships connectors for PostgreSQL, MySQL, MongoDB, SQL Server, Oracle, Cassandra, and others. Changes are captured from the database transaction log and published to Kafka topics, from which any Kafka consumer can read.

Latency

Sub-second. Debezium reads the WAL or binlog directly and publishes to Kafka in near-real time. End-to-end latency from commit to Kafka consumer is typically under a second on well-provisioned infrastructure.

Deployment Model

Fully self-hosted. You are responsible for provisioning Kafka (or Confluent Platform / Amazon MSK), deploying Kafka Connect workers, managing connector configurations, monitoring consumer lag, and handling failure recovery. This is a meaningful operational investment, particularly for teams without dedicated data engineering resources.

Pricing

Free and open source. The real cost is infrastructure (Kafka cluster, compute, storage) and engineering time.

Connector Breadth

Strong for relational databases. Debezium covers the major open-source and commercial databases, though Oracle and Db2 support can require additional licensing and configuration complexity.

Stream Processing

None built in. You need to combine Debezium with a stream processor - Apache Kafka Streams, Apache Flink, or ksqlDB - to transform, filter, or enrich data in-flight. This is a powerful but complex architecture.

Verdict

Debezium is the right choice for teams with strong Kafka expertise who want maximum control and are not paying for managed infrastructure elsewhere. For everyone else, the operational overhead is substantial.

Fivetran

Architecture

Fivetran is a fully managed ELT platform that added CDC capabilities to its connector catalog. It uses a connector-per-source model where each source runs on Fivetran’s managed infrastructure. Data is delivered to destinations like Snowflake, BigQuery, Databricks, and Redshift.

Latency

Fivetran’s CDC connectors offer lower latency than its traditional batch connectors, but the platform is designed primarily for data warehousing workflows rather than operational real-time use cases. Expect latencies in the range of seconds to low minutes depending on the destination and configuration.

Deployment Model

Fully managed SaaS. No infrastructure to provision or manage.

Pricing

Fivetran charges per Monthly Active Row (MAR) - the number of distinct rows synced or updated in a given month. At high data volumes, MAR-based pricing can become expensive, particularly for tables with frequent updates where the same rows are touched many times.

Connector Breadth

Excellent. Fivetran has one of the largest connector catalogs in the industry, covering hundreds of SaaS applications, databases, and file sources beyond CDC.

Stream Processing

Limited. Fivetran is an EL(T) tool - it moves data to the destination and expects transformations to happen in the warehouse using dbt or similar.

Verdict

Fivetran is a strong choice if you need to consolidate many disparate data sources into a warehouse and value connector breadth over low latency. It is not the right tool for operational real-time use cases.

AWS Database Migration Service (DMS)

Architecture

AWS DMS is a cloud-native replication service from Amazon Web Services. It supports full-load and ongoing CDC replication between databases, including cross-engine migrations (e.g., Oracle to Aurora PostgreSQL). It runs as a managed replication instance within your AWS account.

Latency

Seconds to low minutes for CDC replication, depending on instance size and network configuration.

Deployment Model

Managed by AWS, but runs in your AWS account. You provision replication instances, configure endpoints, and monitor tasks via the AWS console or API. Less operational burden than Debezium, but more hands-on than a fully managed SaaS platform.

Pricing

Billed by the hour based on replication instance size, plus data transfer costs. Predictable for steady workloads, but can be difficult to estimate for variable-volume pipelines.

Connector Breadth

Focused on databases, both AWS-native (RDS, Aurora, DynamoDB) and third-party (Oracle, SQL Server, PostgreSQL, MySQL, MongoDB). Not designed for SaaS sources.

Stream Processing

None. DMS moves data as-is. Transformations require downstream processing in services like AWS Glue, Lambda, or EMR.

Verdict

AWS DMS is the practical choice for AWS-centric teams doing database migrations or cross-engine replication within the AWS ecosystem. It is not a general-purpose data integration platform.

Airbyte

Architecture

Airbyte is an open-source data movement platform with both self-hosted (Airbyte OSS) and cloud-managed (Airbyte Cloud) options. It supports CDC via log-based connectors for major databases and offers a large community-contributed connector catalog.

Latency

CDC connectors in Airbyte deliver latency in the range of seconds to minutes. The platform is optimized for reliability and breadth rather than sub-second performance.

Deployment Model

Self-hosted (Kubernetes or Docker) or fully managed via Airbyte Cloud. The self-hosted version requires significant DevOps investment; Airbyte Cloud removes that burden.

Pricing

Airbyte Cloud charges per credit consumed, where credits are based on data volume and connector type. Open-source self-hosted is free but carries infrastructure costs.

Connector Breadth

Very broad. Airbyte has one of the largest connector catalogs of any open-source platform, with hundreds of connectors across databases, SaaS apps, and file systems, many contributed by the community.

Stream Processing

Limited. Airbyte is focused on data movement. Some basic transformations are available, but complex stream processing requires a separate tool.

Verdict

Airbyte is a good fit for teams that need wide connector coverage and are comfortable with either self-hosting or the Airbyte Cloud pricing model. For operational CDC with sub-second latency, look elsewhere.

Streamkap

Architecture

Streamkap is a fully managed CDC and streaming integration platform built natively on Apache Kafka and Apache Flink. It connects directly to source database transaction logs and delivers changes to destinations via a streaming pipeline, with in-flight transformations handled by the underlying Flink engine.

Latency

Sub-second end-to-end. Streamkap is built for operational use cases - real-time analytics, cache invalidation, AI/ML feature pipelines - where data freshness is measured in milliseconds, not minutes.

Deployment Model

Fully managed SaaS. There is no Kafka cluster to provision, no Flink cluster to configure, and no Kafka Connect workers to manage. Pipelines are configured and monitored through a web UI or API.

Pricing

Usage-based pricing tied to data volume and pipeline throughput rather than per-MAR or per-connector seat fees, making costs more predictable at scale.

Connector Breadth

Focused on databases and streaming destinations: PostgreSQL, MySQL, MongoDB, SQL Server, Oracle, and others on the source side; Snowflake, Databricks, Kafka, ClickHouse, Elasticsearch, and others on the destination side. The catalog is narrower than Fivetran or Airbyte but is purpose-built for streaming use cases.

Stream Processing

Native. Because Streamkap is built on Flink, transformations, filtering, column masking, data type coercion, and routing logic can be applied in-pipeline without standing up a separate processing cluster. This is a significant architectural advantage for teams that need to clean or reshape data before it reaches the destination.

Verdict

Streamkap is the right choice for teams that need sub-second latency, want to avoid managing Kafka and Flink infrastructure, and need built-in stream processing without adding a separate layer to the architecture. It is particularly well-suited for AI/ML pipelines, real-time analytics, and operational data synchronization.

Striim

Architecture

Striim is an enterprise streaming integration and CDC platform with both cloud-managed and self-hosted deployment options. It is built on an in-memory stream processing engine and includes a SQL-like query language for defining transformations.

Latency

Sub-second to low seconds for CDC workloads.

Deployment Model

Available as a managed cloud service or on-premises / self-hosted deployment. Striim targets enterprise customers and includes role-based access control, compliance features, and enterprise support.

Pricing

Enterprise licensing. Pricing is not publicly listed and typically involves negotiated annual contracts.

Connector Breadth

Strong for enterprise databases including Oracle, SQL Server, SAP HANA, and IBM Db2, as well as cloud databases and streaming platforms.

Stream Processing

Native stream processing with a SQL-like language for transformations, filtering, aggregations, and joins.

Verdict

Striim is a strong option for large enterprises with complex legacy database environments - particularly Oracle and SAP - and compliance requirements that necessitate on-premises or private cloud deployment. The enterprise pricing model and licensing complexity make it less accessible for smaller teams.

HVR / Qlik Replicate

Architecture

HVR (now part of Qlik as Qlik Replicate) is an enterprise CDC and data replication platform. It is optimized for high-throughput database replication, particularly across heterogeneous environments - different database vendors, versions, and cloud providers.

Latency

Low seconds to minutes, depending on configuration and destination type.

Deployment Model

Self-hosted or managed. HVR/Qlik Replicate is primarily deployed in enterprise on-premises or private cloud environments, though cloud-hosted options exist.

Pricing

Enterprise licensing with negotiated pricing.

Connector Breadth

Broad support for enterprise databases including Oracle, SQL Server, DB2, SAP HANA, Teradata, and cloud databases. Strong coverage for Informatica, Snowflake, and Databricks destinations.

Stream Processing

Basic transformation capabilities. Complex processing typically requires a downstream tool.

Verdict

HVR/Qlik Replicate is a mature enterprise platform well-suited for large-scale database replication, particularly in regulated industries and environments with legacy databases. Overkill for most cloud-native teams.

Side-by-Side Comparison

Tool	Latency	Deployment	Pricing Model	Stream Processing	Best For
Debezium	Sub-second	Self-hosted	Open source	None (external)	Kafka-native teams
Fivetran	Seconds–minutes	Fully managed	Per MAR	None (ELT)	SaaS source breadth
AWS DMS	Seconds–minutes	AWS managed	Hourly	None	AWS migrations
Airbyte	Seconds–minutes	Self-hosted / SaaS	Per credit	Limited	Wide connector coverage
Streamkap	Sub-second	Fully managed	Usage-based	Native (Flink)	Operational real-time
Striim	Sub-second	Enterprise / hybrid	Enterprise license	Native	Enterprise on-prem
HVR/Qlik	Seconds–minutes	Enterprise / hybrid	Enterprise license	Basic	Large-scale DB replication

How to Choose

The right CDC tool depends less on feature checklists and more on your operational context:

If you have Kafka expertise and want full control, Debezium plus a managed Kafka service (Confluent Cloud, Amazon MSK) is a proven architecture at scale.

If you are consolidating many SaaS sources into a warehouse, Fivetran or Airbyte gives you connector breadth that no CDC-focused tool can match.

If you are entirely AWS-native and doing database migrations, AWS DMS is the path of least resistance.

If you need sub-second latency, built-in stream processing, and zero infrastructure to manage, Streamkap is purpose-built for that use case.

If you have Oracle or SAP HANA at the core and enterprise compliance requirements, Striim or HVR/Qlik Replicate deserves serious consideration.

The worst outcome is choosing a tool optimized for batch data warehousing when your architecture needs real-time operational data - or spending engineering cycles managing Kafka infrastructure when a managed platform would free that capacity for product work. Define your latency requirements and operational budget first, then let those constraints guide the evaluation.

Best CDC Tools Compared: A 2026 Guide to Change Data Capture Platforms

What to Look for in a CDC Tool

Debezium

Architecture

Latency

Deployment Model

Pricing

Connector Breadth

Stream Processing

Verdict

Fivetran

Architecture

Latency

Deployment Model

Pricing

Connector Breadth

Stream Processing

Verdict

AWS Database Migration Service (DMS)

Architecture

Latency

Deployment Model

Pricing

Connector Breadth

Stream Processing

Verdict

Airbyte

Architecture

Latency

Deployment Model

Pricing

Connector Breadth

Stream Processing

Verdict

Streamkap

Architecture

Latency

Deployment Model

Pricing

Connector Breadth

Stream Processing

Verdict

Striim

Architecture

Latency

Deployment Model

Pricing

Connector Breadth

Stream Processing

Verdict

HVR / Qlik Replicate

Architecture

Latency

Deployment Model

Pricing

Connector Breadth

Stream Processing

Verdict

Side-by-Side Comparison

How to Choose

Related resources

Kafka Consumer Lag: Causes, Debugging, and Fixes

Kafka on Kubernetes: Real-World Lessons

Backpressure in Stream Processing: What It Is and How to Handle It