What is data freshness vs. context freshness?

Data freshness measures how quickly raw data changes (inserts, updates, deletes) reach your analytical or operational systems. Context freshness measures how quickly business understanding, including metric definitions, schema awareness, business rules, and relationship mappings, reaches the systems that consume data. An agent can have perfectly fresh data but stale context, leading to fast but wrong answers.

What are the three waves of real-time infrastructure?

Wave 1 (2015 to 2023) was fresh data: CDC and streaming ETL reduced data latency from hours to seconds, primarily serving human analysts. Wave 2 (2024 to 2026) is fresh context: streaming semantic layers and context pipelines that keep AI agents' understanding of business data current. Wave 3 (emerging) is fresh decisions: streaming compute platforms like Flink that enable agents to evolve their decision-making logic in real time based on outcomes.

Why will Wave 1 (data freshness) companies get commoditized?

Moving data from point A to point B in real time is increasingly becoming table stakes. CDC and streaming ETL are well-understood patterns with multiple mature offerings. The differentiation is shifting up the stack to context management and decision intelligence. Companies that only move data will compete on price. Companies that deliver fresh data plus fresh context plus fresh decisions will capture more value.

What is the role of Flink in Wave 3?

Apache Flink provides the streaming compute layer for Wave 3 (fresh decisions). Rather than agents that simply query data and context, Flink enables agents that continuously process event streams, apply evolving decision logic, and adjust their behavior based on real-time feedback. This is the frontier: autonomous decision systems that operate on fresh data, with fresh context, making fresh decisions.

How does Streamkap fit across all three waves?

Streamkap's current CDC platform addresses Wave 1 with production-grade real-time data movement from databases to warehouses and operational stores. The platform is building toward Wave 2 with context streaming capabilities that keep semantic layers and agent metadata current. Wave 3 is on the roadmap via managed Flink for agent compute, enabling streaming decision-making on top of fresh data and fresh context.

<--- Back to all resources

AI & Agents

March 10, 2026

11 min read

From Data Freshness to Context Freshness: The Next Wave of Real-Time Infrastructure

Wave 1 was fresh data (CDC to warehouses). Wave 2 is fresh context (streaming semantic layers for agents). Here's why context freshness is the next frontier and what it means for your data stack.

TL;DR: • The real-time data industry has evolved in waves. Wave 1 (2015 to 2023) focused on moving data faster: CDC, streaming ETL, real-time warehouses. The consumer was human analysts wanting fresher dashboards. • Wave 2 (2024 to 2026) is about moving context faster. AI agents need more than fresh data. They need fresh understanding of what data means, which definitions apply, and how the business has changed. Context latency is the new metric that matters. • Wave 3 (emerging) is fresh decisions: streaming compute (Flink-based agents) that evolve decision-making logic in real time based on feedback. Companies that only solve Wave 1 will be commoditized. The value is moving up the stack toward context and decisions.

In 2015, if you asked a data engineer about real-time data, they would describe the dream of replacing nightly batch ETL jobs with something faster. Dashboards that updated every hour instead of every morning. Reports that reflected yesterday’s data instead of last week’s. The gap between “something happens in production” and “an analyst can see it” was measured in hours or days.

By 2023, that dream was largely realized. CDC platforms like Debezium, Streamkap, and managed offerings from the major cloud providers made it possible to stream database changes to warehouses in seconds. Kafka became the central nervous system of data infrastructure. Real-time dashboards went from aspirational to expected.

This was Wave 1 of real-time infrastructure. It was about moving data faster. And it worked. But it turns out that fresh data alone doesn’t solve the problems that matter most in 2026.

Wave 1: Fresh Data (2015 to 2023)

The first wave was defined by a simple metric: data latency. How long does it take for a change in a production database to be queryable in an analytical system?

The starting point: Nightly batch ETL. A cron job runs at 2 AM, extracts everything from the production database, transforms it, and loads it into the warehouse. Data latency: 12 to 24 hours.

The middle ground: Micro-batch ETL. Tools like Fivetran and Airbyte poll source databases every 5 to 15 minutes. Data latency: 15 minutes to 1 hour.

The destination: Streaming CDC. Debezium, Streamkap, and similar tools read the database’s transaction log and stream changes as they happen. Data latency: 1 to 5 seconds.

The primary consumer in Wave 1 was a human analyst looking at a dashboard. And for that use case, the progression from daily to hourly to sub-minute freshness was genuinely valuable. Operational dashboards became possible. Finance teams could close the books faster. Support teams could see recent customer activity without waiting for the next batch load.

The companies that won Wave 1 did so by making data movement reliable, affordable, and easy to set up. The technology challenge was significant, as reading transaction logs from production databases without impacting performance, handling schema evolution, ensuring exactly-once delivery, and managing backfills is non-trivial engineering. But the problem was well-defined: move bytes from A to B, fast.

Success metric for Wave 1: Data latency in seconds. How fast does a committed transaction become queryable?

The Limits of Fresh Data

By the early 2020s, the leading organizations had achieved sub-second data latency. Their warehouses reflected production reality within seconds. And yet, the analysts and decision-makers consuming that data were still struggling.

Why? Because fresh data alone doesn’t tell you what the data means.

A warehouse table updated in real time is still just rows and columns. If you don’t know that the amount column in the orders table represents gross revenue before tax and returns, you’ll calculate the wrong total. If you don’t know that the customers table includes test accounts, your customer count will be wrong. If you don’t know that the fiscal year starts in February, not January, your quarterly reports will slice the data incorrectly.

Humans compensated for this. Experienced analysts carried business context in their heads. They knew which tables to trust, which columns meant what, and which calculations had tricky edge cases. The data was fresh, and the human added the context.

Then AI agents showed up, and the human-as-context-layer pattern broke completely.

Wave 2: Fresh Context (2024 to 2026)

AI agents don’t have institutional knowledge. They don’t know your business conventions. They can’t call a colleague to ask which table is the source of truth. Every query is their first query, and they need everything explained explicitly.

This created a new requirement: not just fresh data, but fresh understanding of what that data means. The metric that matters shifted from data latency to context latency.

Context latency is the time between when something changes about your business (a new product launches, a metric definition evolves, a schema migrates, a business rule updates) and when the systems consuming your data understand that change.

In most organizations today, context latency is measured in days or weeks:

A new product launches on Monday. The semantic layer is updated on Thursday when the data team gets to the ticket.
A schema migration renames columns on Wednesday. The agent’s metadata is corrected the following Monday.
The finance team changes how they calculate net revenue. The documentation is updated next quarter.

This was tolerable when humans were the consumer, because humans are adaptive. They hear about the new product launch in a Slack channel. They notice the schema change when their query breaks and fix it. They get the memo about the new revenue calculation.

Agents don’t hear about things in Slack. They don’t notice anomalies in the same way. They work with whatever metadata and definitions they’ve been given, and if those definitions are stale, the answers are wrong.

Wave 2 is about making context as fresh as data. When a new product launches and new tables appear, the agent’s understanding of revenue should update within seconds, not days. When a schema migrates, the agent’s metadata should reflect the new column names immediately. When a business rule changes, the agent should apply the new rule from that moment forward.

The technology building blocks for Wave 2 include:

CDC with schema evolution tracking. CDC platforms already capture schema changes alongside data changes. The missing piece is routing those schema events to semantic layers and agent metadata stores.
Streaming semantic layers. Metric definitions that update in real time as the underlying data and schemas evolve. This is where traditional semantic layers (dbt metrics, Cube) need to go.
Context delivery protocols. Standards like MCP (Model Context Protocol) that give agents structured access to both data and the context about that data. Not just “here’s the query result” but “here’s the result, here’s what the metric means, here’s how current the data is, and here’s what changed since the last query.”

Success metric for Wave 2: Context latency in seconds. How fast does a new business concept, changed definition, or evolved schema become available to agents?

The Compounding Effect

Fresh data with stale context produces a specific class of errors: technically current numbers that are wrong because they’re calculated or interpreted incorrectly.

Stale data with fresh context produces a different class: correctly calculated numbers that don’t reflect the current state of the business.

Only the combination, fresh data AND fresh context, produces answers that are both current and correct.

This is why Wave 2 isn’t a replacement for Wave 1. It’s additive. You need CDC for data freshness AND context streaming for context freshness. The infrastructure stack grows:

Layer	Wave 1	Wave 2	Both
Data latency	Seconds	N/A	Seconds
Context latency	Days/weeks	Seconds	Seconds
Agent accuracy	Low (fresh but misunderstood data)	Low (correct interpretation of stale data)	High

Companies that invested in Wave 1 infrastructure have a head start on Wave 2 because they already have the streaming foundation. CDC pipelines that capture schema evolution events are the raw material for context streaming. The incremental step is routing those events to semantic layers and agent metadata stores.

Companies that skipped Wave 1 and are still running batch ETL face a compounding deficit: they need to solve both data freshness and context freshness simultaneously, against competitors who are already streaming both.

Wave 3: Fresh Decisions (Emerging)

There’s a third wave forming that goes beyond data and context to the decisions themselves.

In Waves 1 and 2, agents query data and context, then make decisions based on what they find. The decision-making logic lives in the agent’s code, its prompts, its tool definitions. That logic is essentially static between deployments. If you want an agent to change how it makes decisions, you update its code or prompts and redeploy.

Wave 3 asks: what if the decision logic itself was streaming? What if agents could evolve their decision patterns in real time based on outcomes?

Consider an inventory restocking agent. In Waves 1 and 2, this agent queries fresh inventory data, consults a context layer to understand reorder points and supplier lead times, and triggers a restock order when inventory drops below a threshold. The threshold is configured statically.

In Wave 3, the threshold adjusts continuously based on real-time signals. A sudden spike in demand (detected via streaming CDC from the orders database) triggers an immediate threshold increase. A supplier delay (detected via an event from the procurement system) shifts orders to alternative suppliers. Seasonal patterns, detected through streaming aggregation of historical data, pre-adjust inventory levels before demand materializes.

This is where Apache Flink enters the picture. Flink provides the streaming compute layer for decision logic that operates on continuous event streams. A Flink job can:

Aggregate signals from multiple real-time sources
Apply windowed calculations (trending demand over the last 4 hours)
Pattern-match across event sequences (this combination of events has preceded stockouts 80% of the time)
Trigger actions based on complex, evolving conditions

The agent isn’t just querying fresh data with fresh context. It’s running continuous computation over event streams and making decisions that evolve as conditions change. This is qualitatively different from a request-response agent that queries a warehouse.

Success metric for Wave 3: Decision latency. How fast does a change in conditions translate to a changed decision or action?

The Industry Map

These three waves map onto distinct layers of the data infrastructure stack, and different companies are positioned at different layers:

Wave 1 players (data freshness): Fivetran, Airbyte, Debezium, database-native CDC, cloud-native streaming services. This layer is maturing and beginning to commoditize. Competition is increasingly on price, reliability, and breadth of connectors.

Wave 2 players (context freshness): dbt (metrics layer), Cube, AtScale, plus the emerging category of agent-focused metadata services. This layer is nascent, and nobody has fully captured it yet. The winner will be whoever integrates streaming data change events into semantic definitions most effectively.

Wave 3 players (decision freshness): Confluent (ksqlDB), Apache Flink ecosystem, and the emerging “agent compute” category. This layer is early and mostly occupied by infrastructure-heavy platforms that require significant expertise to operate.

The strategic question for data platform companies is: which waves do you serve?

Companies that only serve Wave 1 will face margin pressure as data movement commoditizes. The differentiation is no longer “can you stream CDC?” but “what can you do with the stream?”

Companies that serve Waves 1 and 2 can charge for value (agent accuracy) rather than volume (bytes moved). The context layer is where business-specific knowledge lives, and it’s harder to commoditize because every company’s context is different.

Companies that serve all three waves own the full pipeline from raw database changes to autonomous decisions. This is the most valuable position and the hardest to build.

Where Streamkap Fits

Streamkap’s trajectory maps directly onto these three waves:

Wave 1 (today): Production-grade CDC from PostgreSQL, MySQL, MongoDB, and other databases to warehouses, lakes, and operational stores. This is the foundation. Data latency measured in seconds. Schema evolution handled automatically. Exactly-once delivery guarantees.

Wave 2 (building): Context streaming that routes schema changes, metadata updates, and structural events to semantic layers and agent metadata stores. As agents become a primary consumer of data infrastructure, keeping their context current becomes as important as keeping their data current.

Wave 3 (roadmap): Managed Flink for agent compute. Streaming decision logic that operates on continuous event streams, applies evolving rules, and triggers actions autonomously. This is where agents go from “query and respond” to “continuously process and decide.”

This isn’t just a product roadmap. It’s a thesis about how the data infrastructure industry evolves. The center of gravity is shifting up the stack, from raw data movement to contextualized data to autonomous decisions. Companies that anticipate this shift and build accordingly will define the next era of data infrastructure.

What This Means for You

If you’re running batch ETL today, you’re behind on Wave 1. Start here. Set up CDC from your production databases. Streamkap can have a pipeline running in minutes. The immediate payoff is fresher data and a foundation for everything that follows.

If you have streaming data but your agents are inaccurate, you’re in the Wave 1 to Wave 2 gap. Your data is fresh but your context is stale. Start building a context layer, even a simple one. Document your key metrics and make those definitions available to agents. Then connect CDC schema evolution events to keep those definitions current.

If you’re thinking about agents that operate autonomously (restocking inventory, adjusting pricing, routing support tickets), you’re looking at Wave 3. This requires streaming compute (Flink) on top of fresh data and fresh context. It’s the most complex layer, but it’s also where the highest-value use cases live.

The progression is sequential. You can’t effectively do Wave 2 without Wave 1 (fresh context on stale data is still wrong). You can’t do Wave 3 without Waves 1 and 2 (autonomous decisions need both fresh data and fresh context).

Start where you are. Stream everything you can. Define what matters. The infrastructure you build for today’s use cases becomes the foundation for what’s next.

Ready to build the foundation for all three waves? Streamkap delivers production-grade CDC with sub-second latency, schema evolution tracking, and a roadmap toward context streaming and managed Flink. Start a free trial or learn more about agents.

Products

Capabilities

Streamkap for...

Use Cases

By Destination

Compare

Learn

Company