<--- Back to all resources
Migrating from Self-Managed Debezium to Managed CDC
A practical guide for migrating from self-managed Debezium to a managed CDC service. Covers operational costs, migration planning, zero-downtime cutover, and feature comparison.
Debezium is a solid piece of open-source software. It does exactly what it says: it captures changes from your database transaction log and publishes them to Kafka. The problem is not Debezium itself. The problem is everything around it.
Running Debezium in production means running Kafka Connect in production, which means running Kafka in production, which means running ZooKeeper (or KRaft) in production. Each layer adds operational surface area. Each layer can fail independently. And when something breaks at 3 AM, it is your team’s problem.
This guide covers the practical steps to migrate from a self-managed Debezium setup to a managed CDC service, with a focus on doing it without losing data or taking downtime.
The Real Cost of Self-Managed Debezium
Before we talk about migration, let’s be honest about what you are currently paying. Not just in dollars, but in engineering hours.
The Operational Surface Area
A typical Debezium deployment looks something like this:
┌─────────────┐ ┌────────────────┐ ┌─────────────┐ ┌──────────────┐
│ Source DB │────▶│ Kafka Connect │────▶│ Kafka │────▶│ Consumers │
│ (Postgres, │ │ + Debezium │ │ + Schema │ │ (your apps) │
│ MySQL...) │ │ Connectors │ │ Registry │ │ │
└─────────────┘ └────────────────┘ └─────────────┘ └──────────────┘
Each box in that diagram is something your team has to:
- Provision and size (how many nodes? what instance types?)
- Monitor (is it healthy? is it falling behind?)
- Upgrade (new Debezium version? new Kafka version? compatibility matrix?)
- Secure (TLS certificates, SASL authentication, network policies)
- Back up (connector offsets, schema registry subjects, Kafka topic configs)
- Recover when it fails (and it will fail)
The Incidents You Know Too Well
If you have been running Debezium for more than six months, you have probably seen at least a few of these:
Connector silently stops capturing changes. The connector shows as “RUNNING” in the Kafka Connect REST API, but no new records are appearing in the target topic. The WAL (Write-Ahead Log) on the database is growing. You check the logs and find a serialization error buried in thousands of INFO lines. The connector hit an unsupported data type or a schema change it could not handle, and it stopped processing without changing its status to “FAILED.”
Replication slot bloat. Your PostgreSQL replication slot has been holding onto WAL segments because the connector fell behind. The database disk is filling up. You need to decide: drop the slot and lose your position, or provision more disk and hope the connector catches up.
Schema registry conflicts. Someone added a column to a source table, and the new schema is not backward-compatible with what the schema registry expects. Now your consumers are failing to deserialize messages. You need to update the compatibility mode, register the new schema, and restart the connector, all while data is piling up.
Kafka Connect rebalancing storms. You restarted one connector, and the entire Kafka Connect cluster decided to rebalance all tasks across all workers. Every connector paused for 30 seconds while the group coordinator reassigned tasks.
Each of these incidents costs real engineering time. Multiply by the number of connectors you run and the frequency of incidents, and you start to see the true cost.
Calculating Total Cost of Ownership
Here is a rough framework for estimating what Debezium is actually costing you:
| Cost Category | Typical Monthly Cost |
|---|---|
| Kafka cluster infrastructure (3-5 brokers) | $1,500 - $5,000 |
| Kafka Connect cluster (2-3 workers) | $500 - $1,500 |
| Schema Registry (2 nodes) | $200 - $500 |
| Monitoring and alerting tooling | $300 - $800 |
| Engineering time: incident response (10-20 hrs/month) | $2,500 - $5,000 |
| Engineering time: upgrades and maintenance (5-10 hrs/month) | $1,250 - $2,500 |
| Engineering time: connector configuration and tuning | $1,000 - $2,000 |
| Total | $7,250 - $17,300 |
The infrastructure cost is often the smallest line item. The engineering time is where the real money goes.
What a Managed CDC Service Gives You
A managed CDC service takes the entire middle section of that architecture diagram and replaces it with an API call. You tell it “capture changes from this PostgreSQL database and send them to this Snowflake warehouse,” and it handles everything in between.
Features You Get Without Building Them
- Automatic schema evolution: When a column is added to the source table, the service propagates the change downstream without manual intervention.
- Built-in monitoring and alerting: Pipeline health, throughput metrics, and latency are visible in a dashboard without setting up Prometheus, Grafana, or custom alerts.
- Guaranteed delivery: The service manages offsets and retries internally. You do not need to worry about connector offset storage or commit intervals.
- Connector lifecycle management: No more Kafka Connect REST API calls to check status, restart failed tasks, or update configurations.
- Automatic scaling: The service adjusts resources based on your throughput without you resizing instances.
Feature Comparison
| Capability | Self-Managed Debezium | Managed CDC (e.g., Streamkap) |
|---|---|---|
| Initial setup time | Days to weeks | Minutes |
| Schema evolution | Manual (schema registry config) | Automatic |
| Monitoring | BYO (Prometheus/Grafana) | Built-in dashboard |
| Connector failure recovery | Manual restart + investigation | Automatic retry + alerting |
| Scaling | Manual instance resizing | Automatic |
| Kafka cluster management | Your responsibility | Handled by service |
| Upgrade path | Manual, risky | Managed by provider |
| Snapshot management | Manual configuration | Built-in, configurable |
Planning the Migration
A successful migration starts with an inventory of what you are currently running and a clear picture of what needs to move.
Step 1: Inventory Your Connectors
Document every Debezium connector you are running. For each one, record:
- Source database type and version
- Tables being captured
- Any Single Message Transforms (SMTs) applied
- Destination topic naming convention
- Downstream consumers of each topic
- Current throughput (messages per second)
# Get all connector configs from Kafka Connect
curl -s http://connect-host:8083/connectors | jq -r '.[]' | while read connector; do
echo "=== $connector ==="
curl -s "http://connect-host:8083/connectors/$connector/config" | jq .
echo ""
done > connector-inventory.json
Step 2: Map Your SMTs
If you are using Single Message Transforms, you need to understand whether the managed service supports equivalent transformations. Common SMTs and their managed equivalents:
| Debezium SMT | Purpose | Managed CDC Equivalent |
|---|---|---|
ExtractNewRecordState | Flatten the change event envelope | Usually the default output format |
RegexRouter | Rename target topics | Topic naming configuration |
Filter | Drop certain events | Filter rules or table selection |
TimestampConverter | Format timestamp fields | Schema mapping configuration |
InsertField | Add metadata fields | Often included automatically |
For custom SMTs you have written yourself, you will need to verify the managed service can replicate that logic, either through built-in transformations or through a downstream processing step.
Step 3: Plan Your Destination Mapping
Your current Debezium setup writes to Kafka topics. A managed CDC service may write directly to your analytical destination (Snowflake, BigQuery, Databricks, ClickHouse) without Kafka as an intermediary. Decide whether you want to:
Option A: Replace the entire pipeline. The managed service reads from your source database and writes directly to your destination. This is simpler and eliminates Kafka as a dependency.
Option B: Replace only the CDC capture layer. The managed service reads from your source database and writes to Kafka topics, and your existing consumers continue reading from Kafka. This is a smaller change but keeps the Kafka dependency.
Streamkap supports both patterns. It can stream CDC data directly to destinations like Snowflake, BigQuery, and ClickHouse, or it can land data into Kafka topics for downstream consumption.
Executing a Zero-Downtime Migration
The key principle: never stop your existing pipeline until the new one is proven.
Step 1: Set Up the Managed CDC Service
Configure the managed service to connect to your source database. For PostgreSQL, this means:
- Create a new replication slot for the managed service (do not reuse the Debezium slot).
- Create a dedicated database user with replication privileges.
- Configure the managed service with your connection details and table selection.
-- Create a dedicated replication slot for the managed CDC service
SELECT pg_create_logical_replication_slot('managed_cdc_slot', 'pgoutput');
-- Create a dedicated user
CREATE USER managed_cdc_user WITH REPLICATION PASSWORD 'secure_password';
GRANT SELECT ON ALL TABLES IN SCHEMA public TO managed_cdc_user;
Step 2: Run the Initial Snapshot
The managed service will perform an initial snapshot of your selected tables. This is equivalent to Debezium’s snapshot.mode=initial. During this phase:
- Your existing Debezium connectors continue running normally.
- The managed service reads the current state of each table.
- No data is lost because both systems are capturing changes independently.
Step 3: Validate During Parallel Running
Once the managed service has completed its snapshot and is streaming real-time changes, run both systems in parallel for at least one week. Compare:
- Record counts: Query both destinations for the same time window. Counts should match within the expected delivery latency.
- Data correctness: Spot-check specific records. Pick a few rows that were updated in the source and verify the changes appeared correctly in both destinations.
- Latency: Measure the time from a change in the source database to its appearance in the destination. The managed service should be comparable or faster.
- Schema changes: If possible, make a test schema change (add a nullable column to a non-critical table) and verify both systems handle it correctly.
Step 4: Cut Over Downstream Consumers
Once you are confident in the managed service’s output:
- Redirect downstream consumers to the managed service’s output.
- Monitor for any issues for 24-48 hours.
- Stop the Debezium connectors.
- Drop the old replication slot to stop WAL retention.
-- After decommissioning Debezium
SELECT pg_drop_replication_slot('debezium_slot');
Step 5: Decommission the Old Infrastructure
Once you are confident everything is working:
- Shut down the Kafka Connect workers.
- If no other workloads depend on Kafka, evaluate whether you still need the Kafka cluster.
- Remove the monitoring and alerting configurations for the old pipeline.
- Update your runbooks and on-call documentation.
Handling Edge Cases
Large Tables with Slow Snapshots
If you have tables with hundreds of millions of rows, the initial snapshot can take hours. During this time, changes are still being captured from the WAL, so you will not lose data. However, your destination will not have the complete dataset until the snapshot finishes. Plan your cutover timing accordingly.
Tables with Custom Types or Extensions
Debezium has specific handling for PostgreSQL types like hstore, PostGIS geometry, ltree, and citext. Verify that your managed CDC service supports these types. If not, you may need to add a transformation step or use a different column type in your source schema.
High-Throughput Tables
If you have tables that generate thousands of changes per second, verify the managed service can handle the throughput. Most managed CDC services are designed for high throughput, but it is worth testing with realistic load before cutting over.
Ask the managed service provider about:
- Maximum sustained throughput per pipeline
- How the service handles back-pressure when the destination is slow
- Whether there are any per-table or per-database limits
After the Migration
Once you have successfully migrated, you will notice a few changes to your daily workflow.
What Gets Easier
- No more 3 AM pages for connector failures. The managed service handles retries and recovery automatically.
- Schema changes just work. Add a column, and it flows through to your destination without manual intervention.
- No more version compatibility research. You do not need to check whether Debezium 2.x works with Kafka 3.x and your specific database version.
- Capacity planning becomes the provider’s problem. You do not need to forecast growth and pre-provision infrastructure.
What to Watch For
- Vendor lock-in: Understand the managed service’s data format and whether you can export to standard formats if needed.
- Cost at scale: Managed services typically charge based on data volume or row count. Model your costs at 2x and 5x current volume.
- Feature gaps: If you were using advanced Debezium features like event flattening with custom logic, verify the managed service supports equivalent functionality.
Streamkap is built specifically for real-time CDC workloads and supports the most common source databases and analytical destinations. If you are tired of managing Debezium infrastructure and want to focus on building data products instead, it is worth evaluating whether a managed service fits your needs.