Should I use Lambda or Kappa architecture for a new system in 2026?

For most new systems, Kappa is the better default. Modern stream processors (Flink, Kafka Streams) have largely closed the accuracy gap that motivated Lambda's separate batch layer. If your business logic cannot be expressed incrementally - for example, global sorts, complex window functions over the full history, or ML training jobs - Lambda may still be justified. For everything else, a single streaming path with replayable Kafka topics eliminates the operational burden of maintaining two codebases.

What is the difference between event sourcing and CDC?

Both capture state changes as a sequence of events. The difference is intent and location. Event sourcing is an application design pattern where the application itself writes events as its primary record of state - the event log IS the database. CDC is an infrastructure technique that captures events from an existing database's transaction log without modifying the application. CDC is a way to apply event-sourcing-like benefits (replayable changelog) to applications that were not originally designed with event sourcing.

When does exactly-once processing matter versus at-least-once?

Exactly-once matters when the operation is not idempotent - for example, incrementing a counter, charging a payment, or sending an email. For these cases, processing an event twice produces an incorrect result. For idempotent operations - writing a row with a primary key, setting a value to a specific state, or computing a deterministic aggregate - at-least-once with idempotent writes is equivalent to exactly-once and much simpler to implement.

How do I decide between fan-out and a single enriched stream?

Fan-out (one topic feeding many consumers) is preferred when consumers have different schemas, different SLAs, or different access control requirements. A single enriched stream is preferred when enrichment is universal (all consumers need it), when consumers cannot perform enrichment themselves, or when the enrichment reduces volume significantly. In practice, most platforms use both: a raw CDC topic for consumers that need the source schema, and one or more derived topics with pre-enriched or pre-aggregated data for specific use cases.

Stream Processing Architecture: Patterns for Real-Time Data Systems

The phrase “real-time data system” covers a spectrum from operational databases refreshed every few seconds to sub-millisecond event processing pipelines. The architectural patterns that work at one end of this spectrum fail at the other. Senior engineers and architects designing real-time systems must understand not just what patterns exist, but when each pattern applies - and what it costs.

This guide covers the six most important architectural patterns for stream processing systems, with decision frameworks, tradeoffs, and the questions you need to answer before choosing.

Setting the Context: What Is Stream Processing?

Stream processing is the continuous computation of results over an unbounded sequence of events. Unlike batch processing - which operates on a bounded, complete dataset - stream processing must produce results incrementally as events arrive, without waiting for a “final” dataset.

This creates a set of fundamental tensions that every streaming architecture must resolve:

Completeness vs. latency: Waiting longer produces more accurate results; acting faster requires accepting incomplete information
Consistency vs. availability: Guaranteeing exactly-once processing requires coordination that adds latency and reduces availability
Complexity vs. capability: More powerful processing semantics require more complex infrastructure

Every pattern in this guide represents a different resolution of these tensions.

Pattern 1: Lambda Architecture

Lambda architecture, formalized by Nathan Marz circa 2012, addresses a core problem: stream processing at the time was not accurate enough for analytical workloads, but batch processing was too slow for operational use cases. Lambda’s solution: run both.

Structure

Raw Events ──► Batch Layer (Hadoop, Spark)
            │         │
            │         ▼
            │   Batch Views (complete, accurate, slow)
            │         │
            └──► Speed Layer (Storm, Spark Streaming)     ──► Serving Layer ──► Query
                       │                                       │
                  Real-time Views (incomplete, fast)           │ (merge at query time)

Batch layer: Processes the complete event history. Results are accurate because they operate on a full, immutable dataset. Latency: hours.

Speed layer: Processes only the most recent events not yet covered by the batch layer. Results are approximate or incomplete. Latency: seconds.

Serving layer: Merges results from both layers at query time. The query returns the batch result for old data and the speed-layer result for recent data.

When Lambda Makes Sense

When your stream processing logic cannot be expressed incrementally (global top-N rankings, exact distinct counts, ML inference over full history)
When batch accuracy requirements are regulatory or contractual
When the team has existing investment in a batch processing platform

Why Lambda Falls Out of Favor

The fundamental problem with Lambda is code duplication. You write every computation twice: once in the batch framework (Spark, Hive), once in the stream framework (Storm, Kafka Streams). These implementations inevitably diverge over time. Bug fixes applied to one path are not automatically applied to the other. The “query time merge” logic adds additional complexity.

Modern stream processors have largely eliminated the accuracy gap that motivated Lambda. Apache Flink with event-time processing, watermarks, and replayable Kafka topics can produce batch-accurate results from a streaming computation. For most new systems in 2026, Lambda is an antipattern.

Pattern 2: Kappa Architecture

Kappa architecture, proposed by Jay Kreps in 2014, eliminates Lambda’s batch layer entirely. The insight: if you have a replayable log of immutable events (Kafka, a durable message queue), you can always reprocess history by replaying from the beginning. There is no need for a separate batch layer.

Structure

Immutable Event Log (Kafka)
           │
           │ Stream Processing (Flink, Kafka Streams)
           ▼
      Serving Layer ──► Query

For reprocessing (bug fix, backfill, algorithm change), start a new consumer group from the beginning of the log:

Immutable Event Log (Kafka, long retention)
    ├── Consumer Group A (current version, reading from offset N)
    └── Consumer Group B (new version, replaying from offset 0)
                                │ (when B catches up to A, switch over)

Kappa Requirements

Kappa architecture makes a strong assumption: the event log is complete and replayable. This requires:

Long retention in Kafka (days to months, not hours)
Immutable events: never mutate past events; append-only
Deterministic processing: same input produces same output on replay
Stream-capable logic: all business logic can be expressed as incremental computation

Lambda vs. Kappa Decision Matrix

Consideration	Lambda	Kappa
Processing logic incrementally expressible	Optional	Required
Duplicate code maintenance	Yes	No
Historical reprocessing	Fast (batch)	Possible (replay, slower)
Operational complexity	High (two layers)	Medium (one layer)
Recommended for new systems (2026)	Only if needed	Default choice

Pattern 3: Event Sourcing

Event sourcing inverts the traditional relationship between state and history. In a standard CRUD database, you store current state and discard the operations that produced it. In event sourcing, you store the operations (events) as the primary record, and derive current state by replaying them.

Structure

Command ──► Event Store (append-only log of immutable events)
                 │
                 │ Projection (replay events to build state)
                 ▼
         Current State (derived, rebuildable)

A bank account in event sourcing:

Events (the source of truth):
  - AccountOpened(id=101, initial_balance=0)
  - MoneyDeposited(id=101, amount=500, ts=2024-01-10)
  - MoneyWithdrawn(id=101, amount=200, ts=2024-01-15)
  - MoneyDeposited(id=101, amount=1000, ts=2024-01-20)

Current state (derived by replaying events):
  balance = 0 + 500 - 200 + 1000 = 1300

Benefits of Event Sourcing

Complete audit trail: Every state change is recorded with timestamp and context. This is not a separate audit log bolted on - it is the primary data model.

Temporal queries: “What was the account balance on January 12?” - replay events up to that date.

Event replay: Debug, test, and rebuild downstream systems by replaying the event log.

Decoupled downstream consumers: Any number of projections can be built from the same event stream without modifying the source application.

When Event Sourcing Adds Complexity Without Benefit

Event sourcing is not appropriate for all domains:

High-frequency updates to the same entity: Rebuilding state by replaying 10 million events for a single account is impractical without snapshotting
Simple CRUD applications: The added complexity is not justified if audit trails and temporal queries are not requirements
Teams unfamiliar with the pattern: Event sourcing requires a significant mental model shift; the operational complexity is real

Snapshotting addresses the replay performance problem: periodically save a snapshot of current state, so replay only needs to cover events since the last snapshot.

Event Sourcing vs. CDC

Both capture state changes as events. The distinction:

Event sourcing: The application writes events as its primary state store
CDC: Infrastructure captures events from an existing database’s transaction log

CDC gives event-sourcing-like benefits (replayable changelog) to applications not designed with event sourcing. Change data capture pipelines can serve as the streaming foundation for downstream event-sourcing patterns.

Pattern 4: CQRS (Command Query Responsibility Segregation)

CQRS separates the write model (command side) from the read model (query side). Instead of a single database serving both reads and writes, the write path and read path use different, optimized stores.

Structure

Command ──► Write Model (normalized, ACID, optimized for writes)
                 │
                 │ Event / CDC stream
                 ▼
         Read Models (denormalized, optimized for specific query patterns)
           ├── Read Model A: User profile view (key-value store)
           ├── Read Model B: Search index (Elasticsearch)
           └── Read Model C: Analytics aggregate (ClickHouse / Snowflake)

CQRS in Practice

A product catalog service using CQRS:

Layer	Store	Purpose
Write model	PostgreSQL (normalized)	ACID writes, foreign key constraints
Read model: API	Redis (denormalized JSON)	Sub-millisecond product lookups
Read model: Search	Elasticsearch	Full-text search, faceted filtering
Read model: Analytics	ClickHouse	Category-level aggregations, pricing trends

The read models are maintained by CDC pipelines that stream changes from PostgreSQL to each downstream store. When a product is updated in PostgreSQL, the change propagates to Redis, Elasticsearch, and ClickHouse within seconds.

CQRS Tradeoffs

Benefits:

Read models optimized for specific access patterns - no index tuning compromises
Write path isolated from read load
Independent scaling of read and write tiers
Natural fit for event sourcing (events on the write side fan out to read models)

Costs:

Eventual consistency: read models lag behind the write model (typically seconds)
Operational complexity: multiple stores to monitor, backup, and keep synchronized
Code complexity: each read model requires a separate projection/synchronization pipeline

CQRS is most valuable when read and write patterns are highly divergent - high-volume reads with complex filtering, and transactional writes with integrity constraints. It is overkill for applications with simple, symmetric read/write patterns.

Pattern 5: Materialized Views

A materialized view is a pre-computed, stored result of a query. In streaming systems, materialized views are continuously updated as new events arrive - providing the query performance of a pre-aggregated snapshot with the freshness of a real-time stream.

Static vs. Streaming Materialized Views

Type	Updated	Freshness	Example
Database materialized view	On schedule or on demand	Minutes to hours	Nightly summary tables
Streaming materialized view	Continuously, as events arrive	Seconds	Real-time dashboard aggregates

Streaming Materialized View Patterns

Windowed aggregation: Count events within a time window.

-- Flink SQL: rolling 1-hour order count by region
SELECT
    region,
    TUMBLE_START(event_time, INTERVAL '1' HOUR) AS window_start,
    COUNT(*) AS order_count,
    SUM(amount) AS total_revenue
FROM orders
GROUP BY
    region,
    TUMBLE(event_time, INTERVAL '1' HOUR)

Changelog view (upsert stream): Maintain a current-state view of a mutable entity.

CDC events (inserts, updates, deletes) ──► Flink stateful processing
                                                      │
                                                      ▼
                                         Current-state table (latest row per key)
                                                      │
                                                      ▼
                                         Redis / ClickHouse / Snowflake

Denormalized join view: Pre-join two tables into a single flattened record.

This is the real-time enrichment pattern applied to produce a persistent materialized view rather than an enriched event stream.

Where Materialized Views Live

Destination	Use Case
Redis	Sub-millisecond key lookups; operational read models
ClickHouse	Real-time analytics with second-range query latency
Elasticsearch	Search and filtering over current state
Snowflake / BigQuery	Pre-aggregated warehouse tables for BI tools
RocksDB (embedded in Flink)	Internal stream processor state

Pattern 6: Fan-Out and Fan-In

Fan-out distributes a single stream to multiple consumers or downstream topics. Fan-in merges multiple streams into a single consolidated stream.

Fan-Out

Source stream (CDC events)
         │
         ├──► Topic: events.raw        (archive, compliance)
         ├──► Topic: events.enriched   (ML features, analytics)
         ├──► Topic: events.filtered   (privacy-scrubbed, for third parties)
         └──► Topic: events.aggregated (pre-summed metrics)

Fan-out is implemented by:

Multiple consumer groups reading the same topic independently (simplest; consumers own their own derived state)
Stream processor with multiple sinks (one Flink job writes to multiple output topics)
Kafka Connect with multiple sink connectors (each connector reads the same topic and writes to a different destination)

Fan-In (Stream Merging)

Topic A: orders.us_east   ──┐
Topic B: orders.us_west   ──┤── Union ──► Topic: orders.global
Topic C: orders.eu         ──┘

Fan-in is used to consolidate events from multiple regional or partitioned sources into a single global stream. Flink’s DataStream.union() or the SQL UNION ALL operator handles this. Key concern: event-time alignment - events from different sources may arrive with different clock skews, requiring watermark strategies that account for the slowest upstream partition.

Exactly-Once Semantics: Theory and Practice

Exactly-once processing guarantees that each event affects the output exactly once - no duplicates, no omissions - even in the presence of failures and retries. It is the most demanding delivery guarantee in stream processing.

The Three Delivery Guarantees

Guarantee	Description	Failure behavior
At-most-once	Each event processed zero or one time	Data loss on failure
At-least-once	Each event processed one or more times	Duplicates on failure
Exactly-once	Each event processed exactly once	No data loss, no duplicates

How Exactly-Once Is Achieved

End-to-end exactly-once requires coordination at three levels:

Source: The source must be replayable (Kafka with committed offsets) and must not advance its offset until processing is committed downstream
Processor: The stream processor must checkpoint state atomically - Flink’s distributed snapshots (Chandy-Lamport algorithm) achieve this
Sink: The destination must support either idempotent writes or transactional commits coordinated with the processor’s checkpoint

Flink’s exactly-once mode coordinates checkpoints with Kafka offset commits and transactional sink writes using a two-phase commit protocol. The checkpoint interval (typically 1-30 seconds) determines the granularity of the exactly-once guarantee - and the latency floor.

When Exactly-Once Is Worth the Cost

Operation	Idempotent?	Recommendation
Write row with primary key (Snowflake, Postgres)	Yes	At-least-once + idempotent write
Increment counter	No	Exactly-once required
Send notification / trigger webhook	No	Exactly-once, or deduplication at consumer
Compute deterministic aggregate	Yes (if state is checkpointed)	At-least-once sufficient
Financial calculation (debit/credit)	No	Exactly-once required

The practical guidance: use at-least-once delivery with idempotent destination writes as the default. Reserve exactly-once for operations that are genuinely non-idempotent and where duplicate execution has real-world consequences.

Architecture Decision Framework

When designing a new stream processing system, work through these decisions in order:

Decision 1: Single-layer or two-layer?

Is your processing logic fully expressible as incremental computation?

Yes → Kappa (single streaming layer)
No (global aggregations, ML training, complex algorithms) → Lambda (streaming + batch layers)

Decision 2: Event sourcing or CDC?

Are you building a new application or integrating with existing databases?

New application, audit trail is a core requirement → Event sourcing
Existing databases, need streaming without app changes → CDC

Decision 3: Unified read/write model or CQRS?

Are read and write patterns symmetric or divergent?

Symmetric (simple key lookups, no complex query patterns) → Single store, no CQRS
Divergent (complex search, analytics, operational APIs) → CQRS with streaming materialized views

Decision 4: Exactly-once or at-least-once?

Are the operations in your pipeline idempotent?

Yes → At-least-once with idempotent destination writes
No → Exactly-once (accept higher latency and operational complexity)

Decision 5: Fan-out topology?

How many consumers need the stream?

One → Direct stream; consumer owns its derived state
Many, same schema → Multiple consumer groups on the same topic
Many, different schemas or SLAs → Kafka fan-out with per-consumer derived topics

Putting It Together: A Reference Architecture

A production-grade real-time data platform for a mid-size engineering organization:

Source Systems (PostgreSQL, MySQL, MongoDB)
         │
         │ CDC (Streamkap)
         ▼
Kafka (immutable event log, 30-day retention)
         │
    ┌────┴─────┐
    │          │
    ▼          ▼
Flink       Direct consumers
(Kappa      (microservices reading
processing)  raw CDC topics)
    │
    ├──► Materialized views → Redis (operational read models)
    ├──► Enriched streams → Snowflake (analytics, CQRS read model)
    ├──► Aggregated metrics → ClickHouse (real-time dashboards)
    └──► Search indexes → Elasticsearch (full-text search)

This architecture uses Kappa (no batch layer), CQRS (write in normalized source DBs, read from optimized derived stores), materialized views (pre-computed aggregates in ClickHouse), and fan-out (Kafka feeding both Flink and direct consumers). Delivery is at-least-once with idempotent writes to all destinations.

Picking the Right Architecture

No single stream processing architecture is universally correct. Lambda is operationally expensive but necessary when incremental computation cannot match batch accuracy. Kappa is simpler and increasingly capable but requires a replayable event log and incremental-friendly business logic. Event sourcing and CQRS add power for applications that need complete audit histories and divergent read models, at the cost of significant complexity.

The most common mistake in stream processing architecture is over-engineering - applying Lambda when Kappa suffices, implementing exactly-once when at-least-once with idempotent writes achieves the same result at lower cost. The second most common mistake is under-engineering - choosing at-least-once without verifying that downstream operations are idempotent.

Work through the decision framework above honestly. The right architecture for your system is the one that satisfies your correctness, latency, and operational constraints with the minimum necessary complexity.

Stream Processing Architecture: Patterns for Real-Time Data Systems

Setting the Context: What Is Stream Processing?

Pattern 1: Lambda Architecture

Structure

When Lambda Makes Sense

Why Lambda Falls Out of Favor

Pattern 2: Kappa Architecture

Structure

Kappa Requirements

Lambda vs. Kappa Decision Matrix

Pattern 3: Event Sourcing

Structure

Benefits of Event Sourcing

When Event Sourcing Adds Complexity Without Benefit

Event Sourcing vs. CDC

Pattern 4: CQRS (Command Query Responsibility Segregation)

Structure

CQRS in Practice

CQRS Tradeoffs

Pattern 5: Materialized Views

Static vs. Streaming Materialized Views

Streaming Materialized View Patterns

Where Materialized Views Live

Pattern 6: Fan-Out and Fan-In

Fan-Out

Fan-In (Stream Merging)

Exactly-Once Semantics: Theory and Practice

The Three Delivery Guarantees

How Exactly-Once Is Achieved

When Exactly-Once Is Worth the Cost

Architecture Decision Framework

Decision 1: Single-layer or two-layer?

Decision 2: Event sourcing or CDC?

Decision 3: Unified read/write model or CQRS?

Decision 4: Exactly-once or at-least-once?

Decision 5: Fan-out topology?

Putting It Together: A Reference Architecture

Picking the Right Architecture

Related resources

Kafka Consumer Lag: Causes, Debugging, and Fixes

Kafka on Kubernetes: Real-World Lessons

Backpressure in Stream Processing: What It Is and How to Handle It