<--- Back to all resources
MCP and CDC: The Two Protocols AI Agents Need for Live Data
Model Context Protocol defines how agents ask for data. Change Data Capture defines how data stays fresh. Together, they give agents real-time access to your databases without hammering production systems.
AI agents need two things from your data infrastructure: a way to ask for data, and a guarantee that the data is current. These are separate problems that require separate solutions.
The Model Context Protocol (MCP) solves the first problem. It gives agents a standard interface to discover and query data, regardless of where that data lives.
Change Data Capture (CDC) solves the second problem. It keeps a copy of your database data continuously up to date by streaming every change as it happens.
Together, they form the data access layer that production AI agents need. MCP defines the interface. CDC defines the freshness. This article explains both protocols, how they work together, and when to use them.
The Problem: Agents Need Fresh Data, But Production Databases Cannot Handle the Load
Every AI agent application eventually hits the same wall. The agent needs data from your production database. The obvious approach is to let the agent query the database directly.
This works in development. It fails in production.
Here is why. Your production PostgreSQL database serves your application: web requests, API calls, background jobs. It has a connection pool sized for that workload, maybe 100 to 200 connections. It has query patterns it is optimized for, typically simple lookups by primary key or indexed columns.
Now add 20 AI agents. Each one makes 5 to 10 queries per request. The queries are not the simple lookups your database is optimized for. Agents ask questions like “show me all orders for this customer in the last 90 days with items matching these categories,” which translates to joins across multiple tables with date range filters.
Suddenly your connection pool is contended. Application queries that took 5 milliseconds now take 500 milliseconds because they are waiting for connections. Your monitoring lights up red. Your customers see slow page loads. And you only have 20 agents. What happens when you have 200?
The answer is: you do not let agents query the production database. You give them their own copy of the data, kept fresh by CDC, accessed through MCP.
Change Data Capture: How Data Stays Fresh
CDC is not new technology. Databases have used it internally for replication since the 1990s. What changed is that tools like Debezium and managed platforms like Streamkap made it accessible for external consumers.
How Log-Based CDC Works
Every transactional database writes a log of all changes before applying them. PostgreSQL calls it the Write-Ahead Log (WAL). MySQL calls it the Binary Log (binlog). This log exists for crash recovery and replication, and it contains a complete, ordered record of every insert, update, and delete.
Log-based CDC reads this log and produces a stream of change events. For each change, you get:
- The operation type: insert, update, or delete
- The before state: what the row looked like before the change (for updates and deletes)
- The after state: what the row looks like after the change (for inserts and updates)
- Metadata: timestamp, transaction ID, table name, schema information
This is fundamentally different from query-based change detection (polling the table for changes), which misses deletes, has race conditions, and puts load on the source database. Log-based CDC is passive: it reads a log that the database is already writing. The impact on the source system is negligible.
CDC in Practice
A typical CDC pipeline looks like this:
- Source database writes changes to its transaction log (automatic, no configuration needed)
- CDC connector reads the log and produces change events (Streamkap manages this)
- Streaming transport moves events from source to destination (managed by the platform)
- Destination store receives the events and updates its copy of the data (Redis, Elasticsearch, another database, a warehouse)
The result: the destination store always has a copy of the source data that is current to within 1 to 2 seconds. Agents query the destination instead of the source. The production database is untouched.
What CDC Captures
CDC captures every data change, not just the final state. This matters for several reasons:
Audit trails. You see every intermediate state, not just the current one. If an order went from “pending” to “processing” to “shipped” to “delivered,” CDC captured all four transitions.
Temporal queries. Agents can reason about what changed, not just what the current state is. “This customer’s plan was downgraded 3 hours ago” is more informative than “this customer is on the free plan.”
Event-driven actions. You can trigger agent actions based on specific changes. When an order status changes to “delayed,” notify the customer’s support agent proactively.
Model Context Protocol: How Agents Ask for Data
MCP is newer than CDC. Announced by Anthropic in late 2024, it has quickly become the standard for agent-to-data communication. Think of it as a universal adapter between AI agents and data sources.
The Three MCP Primitives
MCP defines three types of capabilities that a server can expose:
Resources are data that the agent can read. A customer record, an order history, a support ticket. Resources have URIs (like customer://42 or orders://customer/42/recent) and return structured data. The agent discovers available resources through the MCP server’s capability listing.
Tools are functions that the agent can call. Search for customers by email, calculate shipping estimates, check inventory levels. Tools have defined input schemas and return structured results. The agent calls them as part of its reasoning process.
Prompts are templates that the agent can use. Standard ways of formatting data for specific tasks, like summarizing a customer’s history or generating a support response. Prompts provide context-specific framing for the agent’s output.
Why MCP Matters for Agent Architecture
Before MCP, every agent-to-data integration was custom. Want your agent to query PostgreSQL? Write a tool. Want it to query Redis? Write another tool. Elasticsearch? Another tool. Each one with its own error handling, authentication, and data formatting.
MCP replaces this with a single protocol. An MCP server exposes data from any backing store through a consistent interface. The agent does not need to know whether the data is in Redis, PostgreSQL, or Elasticsearch. It queries the MCP server, which handles the translation.
This has practical benefits:
Reduced integration code. One MCP client in your agent replaces dozens of custom tool implementations.
Discoverability. Agents can ask an MCP server “what data do you have?” and get a structured response. They do not need hardcoded knowledge of every data source.
Access control. The MCP server can enforce fine-grained permissions. Agent A gets access to customer names and order statuses. Agent B also gets access to billing details. The access control lives in the MCP server, not scattered across individual tools.
Composability. An agent can connect to multiple MCP servers simultaneously. One for customer data, one for product catalog, one for shipping status. Each server manages its own data domain.
MCP + CDC: The Complete Architecture
Here is how the two protocols work together:
Source Database
|
| (transaction log)
v
CDC Platform (Streamkap)
|
| (change events, sub-second)
v
Agent Data Store (Redis, Elasticsearch, etc.)
|
| (MCP protocol)
v
MCP Server
|
| (queries and responses)
v
AI Agent
Each layer has a specific responsibility:
CDC guarantees freshness. Every change in the source database appears in the agent data store within seconds. No scheduled syncs, no stale caches, no manual refreshes.
The agent data store provides query performance. It is optimized for the access patterns agents need: lookups by customer ID, searches across orders, aggregations over recent activity. It can be a different technology than the source database, chosen for query speed rather than transactional correctness.
MCP provides the interface. Agents connect to the MCP server with a standard protocol, discover available data, and query it. The MCP server translates agent queries into efficient queries against the data store.
The agent makes decisions based on current data. It trusts that the data it receives through MCP is fresh (because CDC keeps it that way) and complete (because CDC captures every change).
A Concrete Example
Let us walk through a specific scenario: an agent needs to answer a customer’s question about their recent order.
Without CDC + MCP:
- Customer asks: “Has my order shipped?”
- Agent calls a custom Lambda function to query the production database
- The query hits the production connection pool (which is already busy serving the application)
- The response comes back after 200ms (or 2000ms if the pool is contended)
- The agent responds
Scale this to 100 concurrent customers, and the production database buckles under the extra query load.
With CDC + MCP:
- Customer asks: “Has my order shipped?”
- Agent queries the MCP server:
get_resource("orders://customer/42/recent") - MCP server queries Redis (which received the latest order update from CDC 1.2 seconds ago)
- Redis responds in under 1ms
- MCP server returns the structured order data to the agent
- Agent responds
The production database never saw the query. Redis handled it in under a millisecond. The data was 1.2 seconds old, which is fresher than any batch or micro-batch approach could deliver.
Now scale this to 100 concurrent customers. Redis handles 100,000 queries per second without breaking a sweat. The production database is completely unaffected.
When to Use MCP vs Direct API Access vs Warehouse Queries
MCP is not always the right choice. Here is when to use each approach:
Use MCP When:
- Multiple agents need access to the same data sources
- You want standard, discoverable data access
- You need fine-grained access control per agent
- The data is operational (current state matters)
- Latency requirements are under 5 seconds
MCP is best for production agent systems that need structured, controlled access to live data.
Use Direct API Access When:
- You have one agent accessing one data source
- The data source already has a well-documented API
- You do not need discoverability (the agent’s tools are hardcoded)
- You are prototyping and speed of development matters more than architecture
Direct APIs are simpler to set up but do not scale well across multiple agents and data sources.
Use Warehouse Queries When:
- The agent needs historical analysis, not current state
- Latency of minutes to hours is acceptable
- You need complex analytical queries (aggregations, window functions, joins across many tables)
- The questions are retrospective: “what happened last quarter” rather than “what is happening now”
Warehouses are the wrong choice for real-time agent access but the right choice for analytical agent workloads.
The Hybrid Approach
Most production systems use all three. MCP for real-time operational data (current order status, account details). Direct APIs for external services (payment processing, shipping carriers). Warehouse queries for analytical context (customer lifetime value, trend analysis).
The key is matching each data access pattern to the right approach. Real-time data through MCP backed by CDC. Analytical data through warehouse queries. External service data through APIs.
Setting Up MCP with CDC: A Practical Path
If you are starting from scratch, here is the order of operations:
Week 1: CDC pipeline. Set up Streamkap to capture changes from your primary database and stream them to a low-latency store (Redis for simple lookups, Elasticsearch for search-heavy access patterns). Verify that data is flowing and fresh.
Week 2: MCP server. Deploy an MCP server that exposes the data in your agent data store. Streamkap provides a built-in MCP server, or you can build your own using the MCP SDK. Define resources for the data your agents need (customers, orders, products).
Week 3: Agent integration. Connect your agent framework to the MCP server. Test with a single agent and a few queries. Verify that the agent receives current data and can act on it correctly.
Week 4: Production hardening. Add monitoring (data freshness alerts, MCP query latency tracking, agent error rates), access controls (which agents can access which resources), and error handling (what happens when the CDC stream falls behind or the MCP server is unavailable).
This timeline assumes a small team (1 to 2 engineers) working with managed infrastructure. Self-hosting the streaming layer adds 4 to 8 weeks of additional setup and ongoing operational work.
The Future of Agent Data Access
MCP and CDC are both early in their adoption curves, but the direction is clear. Agents will not query production databases directly in production systems. They will access data through standardized protocols (MCP) backed by continuous freshness (CDC).
The teams that set up this infrastructure now will have a permanent advantage: their agents will always have better data than competitors’ agents that rely on batch ETL or direct database queries. Better data means better decisions. Better decisions mean better outcomes.
The protocols exist. The infrastructure is available as managed services. The only thing left is connecting them.
Ready to give your AI agents access to fresh, real-time data? Streamkap streams database changes to your agent data stores with sub-second latency, providing the CDC layer that makes MCP trustworthy. Start a free trial or learn more about AI/ML pipelines.