<--- Back to all resources
Replacing Fivetran with Real-Time CDC: A Migration Guide
A practical guide for migrating from Fivetran's batch-based syncs to real-time streaming CDC. Covers latency differences, cost comparison, connector mapping, and step-by-step migration.
Fivetran made ELT accessible. Before tools like it existed, getting data from a production database into a warehouse required custom scripts, cron jobs, and a lot of duct tape. Fivetran replaced that with a clean UI, hundreds of connectors, and a simple promise: set it up once, and your data shows up in your warehouse.
So why would you replace it?
Two reasons keep coming up in conversations with data teams: latency and cost. Fivetran syncs data in batches. Even on its fastest plan, you are looking at 5-minute intervals. For many analytical workloads, that is fine. But if your business is making decisions based on data that is 5, 15, or 60 minutes old, you are operating with a blind spot. And as your data volume grows, Fivetran’s per-row pricing model can push your monthly bill into uncomfortable territory.
This guide walks through how to replace Fivetran with a real-time streaming CDC pipeline, one connector at a time.
Understanding the Latency Gap
The difference between batch sync and streaming CDC is not just “faster.” It is a fundamentally different architecture.
How Fivetran Works
Fivetran connects to your source system on a schedule. Every sync interval (5 minutes, 15 minutes, 1 hour, or 24 hours depending on your plan and configuration), it:
- Queries the source for changes since the last sync (using log-based replication, timestamps, or incremental keys).
- Stages those changes.
- Loads them into your warehouse.
- Updates its internal cursor.
Even when Fivetran uses log-based replication under the hood, it still batches the delivery. The changes are captured from the WAL but buffered and written to the destination on the sync schedule.
Time ──────────────────────────────────────────────────▶
Source DB: ░░░░░ changes happen continuously ░░░░░░░
Fivetran: ──────│ sync │──────│ sync │──────│ sync │
5 min 5 min 5 min
Warehouse: ──────│ data │──────│ data │──────│ data │
appears appears appears
How Streaming CDC Works
Streaming CDC reads directly from the database’s transaction log and delivers each change downstream as it is committed. There is no sync interval. The data flows continuously.
Time ──────────────────────────────────────────────────▶
Source DB: ░░░░░ changes happen continuously ░░░░░░░
Streaming CDC: ░░░░░ changes delivered continuously ░░░░
Warehouse: ░░░░░ data available continuously ░░░░░░░
The practical impact: a row updated in your production PostgreSQL database at 10:00:00 AM might appear in your Snowflake warehouse at 10:00:03 with streaming CDC. With Fivetran on a 5-minute sync, that same row would not appear until somewhere between 10:00:00 and 10:05:00, depending on when the next sync runs.
When Latency Matters
For monthly financial reports, 5-minute latency is irrelevant. But for these use cases, it matters a lot:
- Operational dashboards: Your support team is looking at a dashboard to answer a customer question. If the data is 5 minutes old, they might give wrong answers.
- Inventory and order management: An item sold out 3 minutes ago, but the inventory count still shows available because the sync has not run yet.
- Fraud and anomaly detection: A suspicious transaction happened 4 minutes ago. With batch sync, your detection system will not even see it until the next sync completes.
- AI agent workflows: Agents querying your warehouse for real-time context need fresh data. Minutes-old data can lead to incorrect decisions and poor user experiences.
- Data-driven feature flags: A feature flag tied to a user’s recent activity will not reflect the latest behavior if the underlying data is stale.
The Cost Comparison
Fivetran charges based on Monthly Active Rows (MARs): the number of unique rows that are synced or updated in a billing month. This model is straightforward at small scale but can become expensive as volume grows.
How MARs Pricing Works
Every row that is inserted, updated, or deleted in your source and synced by Fivetran counts as a MAR. If a single row is updated 10 times in a month, it counts as 1 MAR. But if you have 10 million rows that each get updated once, that is 10 million MARs.
Fivetran’s pricing tiers (as of early 2026) roughly break down to:
| Tier | MARs Included | Approximate Cost |
|---|---|---|
| Free | 500,000 | $0 |
| Starter | 2,000,000 | ~$500/month |
| Standard | Custom | ~$1.00 - $1.50 per 1,000 MARs |
| Enterprise | Custom | Negotiable |
Streaming CDC Pricing Models
Streaming CDC platforms typically price based on one or more of:
- Data volume (GB transferred per month)
- Number of connectors (source-destination pairs)
- Compute resources consumed
The important distinction: most streaming CDC platforms do not charge per row. If you have a high-change-rate database where millions of rows are updated daily, the cost difference can be dramatic.
Example Cost Comparison
Consider a mid-size SaaS application with:
- 3 PostgreSQL databases
- 50 tables being synced
- 20 million rows updated per month
- Growing 30% year-over-year
| Cost Factor | Fivetran (Standard) | Streaming CDC (e.g., Streamkap) |
|---|---|---|
| 20M MARs | ~$20,000 - $30,000/month | N/A (volume-based) |
| 3 source connectors | Included | Included in plan |
| Infrastructure | Managed by Fivetran | Managed by provider |
| Estimated monthly cost | $20,000 - $30,000 | $2,000 - $5,000 |
These numbers are approximate and will vary based on your specific contract and usage patterns. The point is not the exact figures but the shape of the curve: per-row pricing scales linearly with your data growth, while volume-based pricing tends to grow more slowly.
Connector Mapping
The most common Fivetran connectors have direct equivalents in the streaming CDC world. Here is how they map:
Database Sources
| Fivetran Connector | Streaming CDC Equivalent | Notes |
|---|---|---|
| PostgreSQL | PostgreSQL CDC (WAL-based) | Direct equivalent; uses logical replication |
| MySQL | MySQL CDC (binlog-based) | Direct equivalent; uses binlog |
| SQL Server | SQL Server CDC | Uses SQL Server’s native CDC or CT |
| MongoDB | MongoDB CDC (change streams) | Uses MongoDB change streams |
| Oracle | Oracle CDC (LogMiner/XStream) | Requires additional Oracle licensing |
SaaS Sources
This is where the picture gets more nuanced. Fivetran has connectors for hundreds of SaaS applications (Salesforce, HubSpot, Stripe, etc.). Streaming CDC is primarily a database technology. For SaaS sources, you have a few options:
- Keep Fivetran for SaaS sources. There is nothing wrong with a hybrid approach. Use streaming CDC for your databases and Fivetran for SaaS connectors where latency is less critical.
- Use webhook-based ingestion. Many SaaS applications offer webhooks that can trigger real-time data capture. Platforms like Streamkap support webhook sources that can ingest these events in real time.
- Use the SaaS API directly. Build a lightweight service that polls the SaaS API and publishes changes to Kafka, then use streaming CDC for the onward delivery.
Destination Support
Fivetran supports a wide range of destinations. Verify that your streaming CDC platform supports yours:
| Destination | Fivetran | Streamkap |
|---|---|---|
| Snowflake | Yes | Yes |
| BigQuery | Yes | Yes |
| Databricks | Yes | Yes |
| Redshift | Yes | Yes |
| ClickHouse | Yes | Yes |
| Kafka | No (Fivetran is source-to-warehouse) | Yes |
| PostgreSQL (as destination) | Yes | Yes |
One advantage of streaming CDC platforms: many support Kafka as a destination, which means you can use the same pipeline to feed both your warehouse and your real-time applications.
Step-by-Step Migration
Step 1: Pick Your First Connector
Do not try to migrate everything at once. Pick a single connector that meets these criteria:
- High change volume: This is where you will see the biggest cost savings.
- Latency-sensitive: This is where you will see the biggest operational benefit.
- Well-understood: Pick a table you know well, so you can easily validate correctness.
A good first candidate is usually your main application database’s high-traffic tables: orders, events, user activity.
Step 2: Set Up the Streaming CDC Pipeline
Configure your streaming CDC platform to connect to the source database and write to your warehouse. For a PostgreSQL-to-Snowflake pipeline on Streamkap, this typically involves:
- Configure the source: Provide the database connection details, select the tables to capture, and configure the replication slot.
- Configure the destination: Provide the Snowflake account, warehouse, database, and schema details.
- Map the schema: Define how source tables map to destination tables. Most platforms will auto-create destination tables matching the source schema.
- Start the pipeline: The service performs an initial snapshot and then begins streaming real-time changes.
Step 3: Write to New Tables
Do not overwrite the tables that Fivetran is managing. Instead, configure the streaming CDC pipeline to write to a separate schema or table prefix:
-- Fivetran-managed tables
fivetran_schema.orders
fivetran_schema.users
fivetran_schema.events
-- Streaming CDC tables
streaming_schema.orders
streaming_schema.users
streaming_schema.events
This lets you compare outputs side by side.
Step 4: Validate Data Correctness
Run validation queries to compare the two sets of tables:
-- Compare row counts
SELECT 'fivetran' as source, COUNT(*) FROM fivetran_schema.orders
UNION ALL
SELECT 'streaming' as source, COUNT(*) FROM streaming_schema.orders;
-- Compare recent records
SELECT f.id, f.updated_at as fivetran_updated, s.updated_at as streaming_updated
FROM fivetran_schema.orders f
FULL OUTER JOIN streaming_schema.orders s ON f.id = s.id
WHERE f.updated_at != s.updated_at
OR f.id IS NULL
OR s.id IS NULL
LIMIT 100;
The streaming CDC tables should be more up-to-date than the Fivetran tables. Small count differences are expected because the streaming pipeline delivers changes faster. What you are looking for is that the streaming tables are a superset of the Fivetran tables (they have everything Fivetran has, plus more recent changes).
Step 5: Migrate Downstream Consumers
Once you are confident in the data quality:
- Update your dbt models, views, or queries to point at the streaming tables.
- Update any BI tool connections (Looker, Tableau, Metabase) to use the new schema.
- Run your test suite or manual checks against the new data.
Step 6: Decommission the Fivetran Connector
After running on the streaming CDC pipeline for at least a week with no issues:
- Pause the Fivetran connector (do not delete it yet).
- Wait another few days to confirm everything is stable.
- Delete the Fivetran connector.
- Drop the old Fivetran-managed tables once you are confident they are no longer needed.
Step 7: Repeat for the Next Connector
Move to the next connector on your list. Each subsequent migration will be faster because your team has already built the validation process and understands the tooling.
What Changes in Your Workflow
Schema Changes
With Fivetran, schema changes in the source database are detected at the next sync and propagated to the destination (usually by adding columns, never by deleting them). With streaming CDC, schema changes are detected immediately and propagated in real time.
However, different platforms handle this differently. Some will automatically add columns. Some will pause the pipeline and alert you. Understand your platform’s behavior before relying on automatic schema evolution in production.
Monitoring and Alerting
Fivetran provides a dashboard showing sync status, row counts, and error logs. Your streaming CDC platform will have equivalent monitoring, but the metrics are different. Instead of “last sync time” and “rows synced per sync,” you will see:
- Current latency: How far behind is the pipeline from the source?
- Throughput: How many events per second is the pipeline processing?
- Pipeline health: Is the connection to the source and destination active?
Cost Forecasting
With Fivetran, you can predict costs by estimating your MAR count. With streaming CDC, you need to estimate your data volume (GB) or event count. Track your initial month’s usage closely to build a baseline for forecasting.
When to Keep Fivetran
Streaming CDC is not the right choice for every connector. Keep Fivetran (or a similar batch tool) for:
- SaaS sources that do not have a change log (Salesforce, HubSpot, Google Analytics). Streaming CDC requires a transaction log, and most SaaS APIs do not expose one.
- Low-change-volume sources where the cost difference is negligible and the latency difference does not matter.
- Sources where your team has no database access and can only connect through an API.
A hybrid approach is perfectly reasonable. Use streaming CDC for your high-volume, latency-sensitive database sources, and keep Fivetran for your SaaS connectors.
Streamkap is built for exactly this kind of workload: high-volume database CDC with real-time delivery to analytical destinations. If your Fivetran bill is growing faster than your data team’s budget, or if your business needs data fresher than what batch syncs can deliver, a streaming CDC platform is worth evaluating.