What is fan-out in stream processing?

Fan-out takes a single input stream and routes events to multiple output streams based on some criteria - content type, customer ID, region, or priority level. For example, a single CDC stream from an orders table might fan out to separate streams for shipping, billing, and analytics.

What is fan-in in stream processing?

Fan-in merges multiple input streams into a single output stream. For example, CDC streams from PostgreSQL, MySQL, and MongoDB databases might all feed into a single unified stream that a central analytics pipeline consumes. The challenge is handling different schemas and ordering guarantees across the merged sources.

How do I handle ordering when merging streams in fan-in?

Within a single source stream, ordering is typically preserved by partition key. When merging multiple streams, global ordering is not guaranteed and usually not needed. If you need ordering within an entity (like all events for a specific customer), route by entity key so all events for that key land in the same partition.

Can I combine fan-out and fan-in in the same pipeline?

Yes, this is very common. A pipeline might fan-in multiple CDC sources into a single Kafka topic, apply transformations with Flink, then fan-out the results to different destinations based on data type or business rules. This creates a diamond-shaped topology that is easy to reason about.

<--- Back to all resources

Architecture & Patterns

February 25, 2026

10 min read

Fan-Out and Fan-In Patterns in Stream Processing

How to implement fan-out (one-to-many) and fan-in (many-to-one) patterns in streaming pipelines. Covers topic routing, parallel processing, and stream merging with Kafka and Flink.

TL;DR: • Fan-out splits a single stream into multiple downstream streams based on content or routing rules. • Fan-in merges multiple streams into a single unified stream for centralized processing. • These patterns are fundamental building blocks for multi-tenant, multi-region, and microservice architectures. • Flink SQL and Kafka topic routing make both patterns straightforward to implement.

If you have spent any time building streaming pipelines, you have run into a version of the same two problems over and over: “I need to send this one stream to many places” and “I need to combine many streams into one place.” These are fan-out and fan-in, and they are two of the most useful patterns in stream processing.

They sound simple, and conceptually they are. But the implementation details - ordering guarantees, schema compatibility, error isolation, backpressure - are where things get interesting. This article breaks down both patterns, walks through practical implementations with Kafka and Flink SQL, and covers the gotchas that bite teams in production.

What Fan-Out Actually Looks Like

Fan-out takes a single input stream and routes events to multiple downstream consumers or topics. The “one-to-many” pattern. You see it everywhere: a CDC stream from your orders table feeding shipping, billing, and analytics systems simultaneously. A single event log splitting into region-specific streams. An ingest pipeline distributing work across parallel processors.

There are three common flavors of fan-out, and picking the right one matters.

Content-Based Routing

This is the most common form. Events are inspected and routed based on their content - a field value, an event type, a header. A single Kafka topic containing mixed event types gets split into dedicated topics for each type.

Say you have a transactions topic where every event includes a region field. Content-based routing sends US transactions to transactions-us, EU transactions to transactions-eu, and APAC transactions to transactions-apac. Each downstream consumer only sees the events it cares about.

In Flink SQL, this is straightforward:

-- Route US transactions to their own topic
INSERT INTO transactions_us
SELECT * FROM transactions
WHERE region = 'US';

-- Route EU transactions
INSERT INTO transactions_eu
SELECT * FROM transactions
WHERE region = 'EU';

Each INSERT INTO statement creates an independent Flink job that reads from the source topic and writes to the destination. The source topic is read once per downstream job, which is fine for moderate fan-out factors (say, 2-10 outputs). For higher fan-out, you will want to look at side outputs or a single job with multiple sinks.

Broadcast

Broadcast sends every event to every downstream consumer, no filtering. This is useful when multiple systems need to see the full stream - for example, when you have both a real-time dashboard and an audit log that need every single event.

In Kafka, broadcast is implicit. Multiple consumer groups reading from the same topic each get every message independently. No special configuration needed. Each consumer group tracks its own offsets and processes at its own pace.

The key thing to understand: broadcast does not copy data. Kafka stores the messages once, and each consumer group simply maintains a separate cursor. This is one of Kafka’s architectural strengths - fan-out via consumer groups is essentially free in terms of storage.

Partition-Based Fan-Out

Sometimes fan-out happens at the partition level. You configure your producer to route events to specific partitions using a custom partitioner, and then assign different consumers to different partition ranges. This is less common than content-based routing but useful when you need strict ordering within a subset of your data.

For example, in a multi-tenant system, you might hash on tenant_id so that all events for a given tenant land in the same partition. Downstream, each consumer handles a specific set of tenants, giving you both parallelism and per-tenant ordering.

What Fan-In Actually Looks Like

Fan-in is the inverse: multiple input streams merging into a single output stream. The “many-to-one” pattern. Think of CDC streams from PostgreSQL, MySQL, and MongoDB all feeding into a unified analytics topic. Or regional event streams from US, EU, and APAC clusters merging into a global view.

Fan-in has its own set of flavors, each with different trade-offs.

Union

Union is the simplest form - take events from multiple sources and write them all to the same destination, preserving every event. No deduplication, no aggregation, just concatenation.

In Flink SQL:

INSERT INTO unified_events
SELECT event_id, event_type, payload, event_time, 'postgres' as source_db
FROM postgres_cdc_stream
UNION ALL
SELECT event_id, event_type, payload, event_time, 'mysql' as source_db
FROM mysql_cdc_stream;

UNION ALL keeps all rows, including duplicates. This is usually what you want in streaming because deduplication across distributed sources is expensive and often unnecessary.

Merge with Schema Alignment

Real-world fan-in rarely involves sources with identical schemas. Your PostgreSQL table might have created_at as a timestamp, while MySQL uses creation_date as a datetime string. Merging these requires a transformation layer that normalizes schemas before writing to the unified stream.

This is where Flink SQL really shines. You can project, cast, and rename fields inline:

INSERT INTO unified_customers
SELECT
    CAST(id AS STRING) as customer_id,
    name,
    email,
    CAST(created_at AS TIMESTAMP(3)) as created_time,
    'postgres' as origin
FROM postgres_customers
UNION ALL
SELECT
    customer_ref as customer_id,
    full_name as name,
    email_address as email,
    CAST(creation_date AS TIMESTAMP(3)) as created_time,
    'mysql' as origin
FROM mysql_customers;

The unified topic has a clean, consistent schema regardless of the source. Downstream consumers never need to know that the data came from different databases with different naming conventions.

Coalesce

Coalesce goes a step further than union - it merges streams and resolves conflicts. When the same entity appears in multiple source streams, coalesce picks the “winner” based on a rule: latest timestamp, highest priority source, or a custom merge function.

This pattern is common in multi-region architectures where the same customer record might be modified in different regions. You need a single, consistent view. Flink’s windowing and deduplication features handle this:

INSERT INTO canonical_customers
SELECT *
FROM (
    SELECT *,
        ROW_NUMBER() OVER (
            PARTITION BY customer_id
            ORDER BY event_time DESC
        ) as row_num
    FROM unified_customers
)
WHERE row_num = 1;

This keeps only the most recent event for each customer, regardless of which source stream it came from.

Practical Architectures

These patterns are not academic exercises. They show up in nearly every production streaming system. Here are two architectures where fan-out and fan-in work together.

Multi-Tenant Data Isolation

A SaaS platform captures events from all tenants into a single Kafka topic. Downstream, different tenants have different requirements: some want their data in Snowflake, others in BigQuery, some need real-time Elasticsearch indexing.

The pipeline looks like this:

Ingest - All tenant events land in a single events topic, partitioned by tenant_id.
Fan-out - A Flink job reads the unified topic and routes events to tenant-specific topics based on configuration rules. High-value tenants get dedicated topics; smaller tenants share pooled topics.
Destination routing - Each tenant-specific or pooled topic feeds the appropriate destination connector.

This pattern gives you the operational simplicity of a single ingest pipeline with the flexibility of per-tenant routing. Streamkap’s connector architecture is built around exactly this kind of topology - a single CDC source feeding multiple destinations through configurable routing.

Microservice Event Aggregation

In a microservice architecture, each service emits its own event stream: order-events, payment-events, inventory-events, shipping-events. An analytics team needs a unified view across all services to build dashboards and run queries.

Fan-in - A Flink job reads from all four service topics, normalizes the schemas, and writes to a single business-events topic.
Enrichment - Another Flink job joins the unified stream with reference data (customer profiles, product catalog) to produce enriched events.
Fan-out - The enriched stream fans out to Snowflake for analytics, Elasticsearch for search, and a real-time dashboard service.

This creates the “diamond topology” - fan-in at the top, processing in the middle, fan-out at the bottom. It is one of the most common and most effective streaming architectures because it keeps the pipeline easy to reason about while supporting diverse downstream needs.

Ordering Considerations

Ordering is where fan-out and fan-in get tricky. Let’s be direct about what you can and cannot guarantee.

Within a single Kafka partition, ordering is guaranteed. Messages are appended in order, and consumers read them in order. This is Kafka’s fundamental contract.

Across partitions, there is no ordering guarantee. Messages in partition 0 and partition 1 are processed independently. If you need ordering for a specific entity (all events for customer 12345), you must ensure all events for that entity land in the same partition. Use the entity key as your Kafka message key.

In fan-out, ordering within a partition key is preserved as long as your routing does not change the key. If you split a transactions topic into transactions-us and transactions-eu, and you keep the original message keys, then all events for a given transaction ID stay ordered within their regional topic.

In fan-in, global ordering across merged streams is impossible and almost never needed. What you typically need is per-entity ordering - and you get that by routing on entity key. When merging streams from PostgreSQL and MySQL, if both have records for customer 12345, make customer_id the Kafka message key in the unified topic. All events for that customer land in the same partition and stay ordered.

If you truly need cross-source ordering (which is rare), you will need event-time-based windowing in Flink with watermarks. This lets you reorder events across sources based on their event timestamps, at the cost of added latency equal to your watermark delay.

Error Handling in Multi-Stream Topologies

Fan-out and fan-in create more failure points than a simple linear pipeline. A single slow or failing downstream in a fan-out can cause backpressure that affects all other downstreams. A single poisoned source in a fan-in can corrupt the merged stream.

Here are the patterns that work in production.

Dead Letter Queues

For fan-out, every downstream path should have its own dead letter queue (DLQ). When a message fails to process or write to a destination, it goes to the DLQ instead of blocking the entire pipeline. This isolates failures - a bad record heading to Elasticsearch does not block the Snowflake pipeline.

source-topic --> fan-out --> destination-1 (DLQ: destination-1-dlq)
                         --> destination-2 (DLQ: destination-2-dlq)
                         --> destination-3 (DLQ: destination-3-dlq)

Circuit Breakers

When a downstream destination goes down entirely (not just the occasional bad record), you need a circuit breaker. The fan-out job detects repeated failures to a specific destination and temporarily stops sending to it while continuing to serve other downstreams. Once the destination recovers, the circuit breaker resets and replay catches up from the last committed offset.

Schema Validation at the Boundary

For fan-in, validate schemas at the merge point, not downstream. If a source starts sending events with a changed schema, you want to catch that before it corrupts your unified stream. Flink SQL’s type system helps here - a CAST that fails will surface the problem immediately rather than letting malformed data propagate.

Independent Consumer Groups

In fan-out via Kafka consumer groups, each downstream operates independently. If the Elasticsearch consumer falls behind, it does not affect the Snowflake consumer. This is one of the big advantages of Kafka-based fan-out over application-level fan-out - the broker absorbs the backpressure per consumer group.

Putting It Together with Streamkap

Building these topologies from scratch requires managing Kafka clusters, deploying and monitoring Flink jobs, configuring connectors, and handling schema evolution across all of them. That is a lot of moving parts.

Streamkap handles the infrastructure layer so you can focus on the topology itself. CDC sources feed into managed Kafka, Flink SQL handles the routing and transformation logic, and pre-built destination connectors manage the fan-out to your data warehouse, search index, or application database. The fan-in and fan-out patterns described in this article are not special features - they are natural consequences of how Kafka and Flink work together, and Streamkap makes that combination accessible without the operational overhead.

Whether you are splitting a single CDC stream across ten destinations or merging five databases into one analytics pipeline, the building blocks are the same: topics, consumer groups, SQL transformations, and message keys. Master fan-out and fan-in, and you can model almost any data flow your architecture requires.

Ready to route CDC data to multiple destinations without managing Kafka and Flink yourself? Streamkap handles fan-out from a single database source to your warehouse, search index, and more. Start a free trial or browse supported connectors.

Products

Capabilities

Streamkap for...

Use Cases

By Destination

Compare

Learn

Company