<--- Back to all resources

CDC & Replication

March 12, 2026

10 min read

Database Synchronization: Methods, Tools, and Architecture Guide

How to keep databases in sync — replication methods, CDC vs polling, conflict resolution, and tool recommendations for one-way and bidirectional sync.

TL;DR: Database synchronization keeps two or more databases consistent. CDC-based sync is the modern standard — it reads transaction logs for sub-second latency with minimal source impact.

Every production system eventually outgrows a single database. You add an analytics warehouse, a search index, a cache layer, maybe a second region for disaster recovery. The moment you have two copies of the same data, you have a synchronization problem: how do you keep them consistent, how fast does consistency matter, and what happens when things go wrong?

This guide covers the four main approaches to database synchronization, walks through the architecture decisions you will face, and gives concrete guidance on when to use each approach.

What Database Synchronization Actually Means

Database synchronization is the process of propagating data changes from one database to one or more other databases so that they converge to the same state. That sounds simple. In practice, it involves handling inserts, updates, deletes, schema changes, network failures, ordering guarantees, and conflict resolution — all without taking down the source system or dropping data on the floor.

The two fundamental dimensions of any sync architecture are:

  • Direction: One-way (source to target) or bidirectional (both databases accept writes and sync to each other).
  • Latency: Batch (minutes to hours), near-real-time (seconds), or real-time (sub-second).

Your choice of sync method determines where you land on both axes.

Four Methods for Database Synchronization

1. Polling (Timestamp-Based Queries)

The most straightforward approach. A scheduler runs a query against the source database on an interval — every minute, every five minutes, every hour — selecting rows where updated_at > last_sync_timestamp.

How it works:

  1. Store the high-water mark (the last updated_at value you processed).
  2. Query: SELECT * FROM orders WHERE updated_at > :high_water_mark ORDER BY updated_at.
  3. Write results to the target database.
  4. Update the high-water mark.

Where it works well: Small tables, low-frequency changes, systems where latency in the minutes-to-hours range is acceptable.

Where it breaks down:

  • Deletes are invisible. A polling query only sees rows that exist. If a row is deleted from the source, you never find out unless you add soft-delete columns or run periodic full-table diffs.
  • Source load scales with frequency. Polling every 10 seconds against a 500M-row table means running a range scan every 10 seconds. The source database pays for every query.
  • Clock skew and transaction boundaries. Rows committed at nearly the same time can have updated_at values that straddle your query window. Without careful handling, you either miss rows or process them twice.
  • No intermediate states. If a row is updated three times between polls, you only see the final value. The two intermediate states are lost.

Polling is easy to build and easy to reason about. It is also the first thing most teams replace when latency requirements tighten or the source database starts complaining about read load.

2. Trigger-Based Synchronization

Database triggers fire on INSERT, UPDATE, or DELETE and write the change to a staging table (often called a “shadow” or “audit” table). A separate process reads the staging table and pushes changes to the target.

How it works:

  1. Create AFTER INSERT/UPDATE/DELETE triggers on each source table.
  2. Triggers write the old and new row values to a _changes table with a sequence number.
  3. A sync process reads the _changes table, applies changes to the target, and deletes processed rows.

Where it works well: Systems where you need every intermediate change (not just the latest state) and cannot use log-based CDC — for example, databases that do not expose a transaction log (some older MySQL configurations, certain cloud-managed databases with limited log access).

Where it breaks down:

  • Write amplification. Every write to the source table now generates a second write to the changes table. On write-heavy workloads, this doubles your I/O.
  • Lock contention. Triggers execute inside the same transaction as the original write. A slow trigger means a slow insert.
  • Schema coupling. Adding a column to the source table means updating the trigger and the changes table. This maintenance burden grows linearly with the number of synced tables.
  • No DDL capture. Triggers fire on data changes, not schema changes. You need a separate mechanism to detect and propagate DDL.

Trigger-based sync is a reasonable choice when you control both databases, have moderate write throughput, and need change-level granularity. It is not a good fit for high-throughput OLTP systems or situations where source database performance is a hard constraint.

3. Log-Based Change Data Capture (CDC)

CDC reads the database’s own transaction log — the write-ahead log (WAL) in PostgreSQL, the binlog in MySQL, the oplog in MongoDB, the redo log in Oracle. Every committed change appears in the log, in order, exactly once. A CDC process tails this log and emits change events.

How it works:

  1. Enable logical replication or log access on the source (e.g., wal_level=logical for PostgreSQL, binlog_format=ROW for MySQL).
  2. A CDC connector (Debezium, or a managed platform like Streamkap) reads the log from a known position.
  3. Each change event contains the operation type (insert/update/delete), the before and after values, and transaction metadata.
  4. The connector writes these events to the target database, a message bus (Kafka), or a destination connector.

Where it works well: Virtually every production synchronization use case. CDC captures all changes (including deletes), adds near-zero load to the source database (it reads the log, not the tables), preserves ordering, and delivers sub-second latency.

Where it breaks down:

  • Initial setup. Enabling log-based replication requires configuration changes on the source database. On managed databases (RDS, Cloud SQL), this is usually a parameter group change and a restart.
  • Log retention. If the CDC consumer falls behind and the source database has already purged the relevant log segments, you need to re-snapshot the table. Managed CDC platforms handle this automatically.
  • Heterogeneous schema mapping. When syncing between different database engines (PostgreSQL to Snowflake, MongoDB to BigQuery), the CDC platform must translate data types and structures. Not all tools handle this well.

Log-based CDC is the default recommendation for database synchronization in 2026. It is the only method that captures every change, including deletes, with minimal impact on the source.

For a deeper look at how CDC works, see the CDC capabilities overview.

4. Application-Level Synchronization

The application itself writes to multiple databases. Instead of relying on database-layer replication, your application code (or a middleware layer) issues writes to both the primary database and the sync targets.

How it works:

  1. Application writes to the primary database.
  2. After a successful commit, the application publishes an event (to a message queue, event bus, or directly to the second database).
  3. A consumer applies the event to the target.

This is sometimes called the “outbox pattern” when the event is written to an outbox table in the same transaction as the primary write, then published asynchronously.

Where it works well: Microservices architectures where services already communicate via events. Teams that want fine-grained control over what data is synchronized and how it is transformed.

Where it breaks down:

  • Dual-write consistency. Writing to two systems without a distributed transaction means one can succeed while the other fails. The outbox pattern solves this but adds complexity.
  • Maintenance burden. Every schema change, every new table, every new field requires code changes in the application layer. With CDC, new columns appear in the change stream automatically.
  • Incomplete coverage. If someone runs a manual SQL update or a migration script bypasses the application layer, the change is not captured.

Application-level sync is a valid pattern when you need custom transformation logic or when CDC is not available for your source database. For straightforward “keep these two databases consistent” requirements, CDC is simpler and more reliable.

One-Way vs. Bidirectional Synchronization

One-Way (Unidirectional)

One database is the source of truth. Changes flow from source to target. The target is read-only or at least does not write back to the source.

This is the common case: OLTP database to analytics warehouse, primary database to search index, production database to read replica. One-way sync is simpler, has no conflict resolution concerns, and is well-served by any of the methods above.

Bidirectional (Multi-Master)

Both databases accept writes and synchronize changes to the other. This is harder by an order of magnitude, because two users can update the same row at the same time on different databases.

Bidirectional sync is typically needed for:

  • Multi-region deployments where each region has a local writable database.
  • Offline-first applications where mobile clients write to a local database and sync when reconnected.
  • Active-active disaster recovery where both sites handle production traffic.

If you can avoid bidirectional sync, avoid it. Use one-way replication with application-level routing (e.g., all writes go to a single primary, reads can hit any replica). If you genuinely need bidirectional sync, you will need a conflict resolution strategy.

Conflict Resolution Strategies

When two databases accept concurrent writes, conflicts are inevitable. Here are the standard approaches:

Last-write-wins (LWW). Each change carries a timestamp. The most recent write wins. This is simple but can silently discard valid changes. It works when data is rarely updated concurrently and losing an occasional update is acceptable.

Source priority. Designate one database as the “preferred” source. In a conflict, its version wins. This is LWW without the clock-skew risk, but it creates an asymmetry that may not match your application semantics.

Field-level merge. Instead of replacing the entire row, merge at the column level. If database A updates email and database B updates phone_number, keep both changes. This requires tracking which fields changed, not just which rows.

Application-level resolution. Surface conflicts to the application or to a human for manual resolution. This is the safest approach for data that cannot tolerate silent overwrites (financial records, medical data) but adds latency and operational burden.

Conflict-free Replicated Data Types (CRDTs). Data structures designed so that concurrent updates always converge without coordination. CRDTs are powerful but constrain the kinds of operations you can perform. They work well for counters, sets, and text editing — less well for arbitrary relational data.

For most teams, one-way CDC sync with a single source of truth eliminates conflict resolution entirely. If you find yourself designing a conflict resolution strategy, first ask whether you can restructure to avoid the need.

Handling Schema Drift

Databases change. Columns are added, renamed, or dropped. Data types change. Tables are created. A synchronization system that cannot handle schema evolution will break every time a developer runs a migration.

What to look for in a sync tool:

  • Automatic new column propagation. When a column is added to the source, it should appear in the target without manual intervention.
  • Data type mapping. When syncing between different database engines (e.g., PostgreSQL to Snowflake), the tool must correctly map data types. A PostgreSQL jsonb column needs to arrive in Snowflake as a VARIANT, not a truncated string.
  • Non-destructive handling of dropped columns. When a column is dropped from the source, the target should retain the column (with NULLs for new rows) rather than dropping it and losing historical data.
  • DDL event capture. Log-based CDC can capture DDL events (ALTER TABLE, CREATE TABLE) alongside data changes, allowing the sync process to react automatically.

Schema drift is one of the biggest operational headaches in database synchronization. Managed CDC platforms like Streamkap handle schema evolution automatically — detecting new columns, mapping types across engines, and applying DDL changes to the target without pipeline restarts.

Architecture Patterns

Pattern 1: Source Database to Analytics Warehouse

The bread-and-butter pattern. Your PostgreSQL or MySQL OLTP database feeds a Snowflake, BigQuery, or ClickHouse warehouse.

Recommended approach: One-way CDC. The warehouse is read-only from the sync perspective. Initial snapshot loads the current state; ongoing CDC streams incremental changes with sub-second capture latency.

Pattern 2: Database to Search Index

An OLTP database feeds an Elasticsearch or search index to power full-text search, faceted filtering, or autocomplete.

Recommended approach: One-way CDC with transformation. The search index typically stores a denormalized view of the data. The sync pipeline joins or reshapes data from multiple source tables into the search document format.

Pattern 3: Database to Cache

Redis or another cache layer sits in front of the database to reduce read latency. The cache must be invalidated or updated when the underlying data changes.

Recommended approach: CDC-driven cache invalidation. Instead of relying on TTLs or application-level cache invalidation (which miss direct database updates), CDC events trigger cache updates or invalidations in real time.

Pattern 4: Cross-Region Replication

The same dataset needs to be available in multiple regions for low-latency reads.

Recommended approach: One-way CDC from a primary region to read replicas in other regions. If your database engine supports native replication (PostgreSQL streaming replication, MongoDB replica sets), use it. For cross-engine or cross-cloud replication, a CDC platform fills the gap.

Pattern 5: Database Migration

Moving from one database to another — say, from a self-managed PostgreSQL to a managed MongoDB or from Oracle to PostgreSQL. You need to sync both databases during the transition period while applications are cut over.

Recommended approach: CDC for the transition window. Snapshot the source, start CDC replication, run both databases in parallel, validate consistency, then cut over reads and writes to the new database.

Choosing a Synchronization Tool

The tool landscape breaks into three tiers:

DIY with open-source components. Debezium for CDC, Kafka for transport, Kafka Connect for sinks. Full control, full operational responsibility. You run the infrastructure, handle monitoring, manage connector upgrades, and deal with failures. Appropriate if you have a dedicated data infrastructure team and want to customize every part of the pipeline.

Managed CDC platforms. Streamkap, and similar managed services, run the CDC infrastructure for you. You configure source and target connectors through a UI or API. The platform handles snapshotting, log reading, schema evolution, delivery guarantees, and monitoring. Appropriate when you want CDC reliability without the operational overhead of managing Debezium and Kafka yourself.

Batch ETL tools. Fivetran, Airbyte, Stitch. These tools primarily use polling-based sync with configurable intervals (5 minutes to 24 hours). They work well for analytics use cases where latency in the minutes range is acceptable. They are not the right fit for real-time synchronization.

When evaluating, focus on:

  1. Latency. What is the end-to-end delay from a source commit to the data being available in the target? CDC platforms deliver sub-second capture; batch tools deliver minutes to hours.
  2. Source impact. Does the tool query your production database (polling) or read the transaction log (CDC)? Log-based reads add negligible load.
  3. Schema evolution. Does the tool automatically handle new columns, type changes, and DDL? Or does every migration require manual pipeline changes?
  4. Delivery guarantees. At-least-once? Exactly-once? What happens after a failure — does the tool resume from the last known position or re-snapshot?
  5. Connector coverage. Does the tool support your specific source and target databases? Check for your exact versions and configurations.

Getting Started

If you are designing a new synchronization architecture, start with these defaults:

  • Use one-way sync unless you have a concrete requirement for bidirectional writes.
  • Use log-based CDC as the synchronization method. It is lower impact, lower latency, and more reliable than polling or triggers.
  • Use a managed platform unless you already have a team running Kafka and Debezium in production.
  • Design for schema evolution from the start. Pick a tool that handles new columns automatically.
  • Monitor lag. The single most important metric for any sync pipeline is replication lag — the delay between a source commit and the corresponding target update.

Database synchronization is a solved problem at the infrastructure level. The hard part is not building the pipeline — it is choosing the right architecture for your consistency, latency, and operational requirements, and then running it reliably in production.


Need to keep your databases in sync? Streamkap uses CDC to replicate changes from PostgreSQL, MySQL, MongoDB, and 60+ sources to any destination — with sub-second latency. Start a free trial or see how CDC works.