<--- Back to all resources
Fan-Out and Fan-In Patterns in Stream Processing
How to implement fan-out (one-to-many) and fan-in (many-to-one) patterns in streaming pipelines. Covers topic routing, parallel processing, and stream merging with Kafka and Flink.
If you have spent any time building streaming pipelines, you have run into a version of the same two problems over and over: “I need to send this one stream to many places” and “I need to combine many streams into one place.” These are fan-out and fan-in, and they are two of the most useful patterns in stream processing.
They sound simple, and conceptually they are. But the implementation details - ordering guarantees, schema compatibility, error isolation, backpressure - are where things get interesting. This article breaks down both patterns, walks through practical implementations with Kafka and Flink SQL, and covers the gotchas that bite teams in production.
What Fan-Out Actually Looks Like
Fan-out takes a single input stream and routes events to multiple downstream consumers or topics. The “one-to-many” pattern. You see it everywhere: a CDC stream from your orders table feeding shipping, billing, and analytics systems simultaneously. A single event log splitting into region-specific streams. An ingest pipeline distributing work across parallel processors.
There are three common flavors of fan-out, and picking the right one matters.
Content-Based Routing
This is the most common form. Events are inspected and routed based on their content - a field value, an event type, a header. A single Kafka topic containing mixed event types gets split into dedicated topics for each type.
Say you have a transactions topic where every event includes a region field. Content-based routing sends US transactions to transactions-us, EU transactions to transactions-eu, and APAC transactions to transactions-apac. Each downstream consumer only sees the events it cares about.
In Flink SQL, this is straightforward:
-- Route US transactions to their own topic
INSERT INTO transactions_us
SELECT * FROM transactions
WHERE region = 'US';
-- Route EU transactions
INSERT INTO transactions_eu
SELECT * FROM transactions
WHERE region = 'EU';
Each INSERT INTO statement creates an independent Flink job that reads from the source topic and writes to the destination. The source topic is read once per downstream job, which is fine for moderate fan-out factors (say, 2-10 outputs). For higher fan-out, you will want to look at side outputs or a single job with multiple sinks.
Broadcast
Broadcast sends every event to every downstream consumer, no filtering. This is useful when multiple systems need to see the full stream - for example, when you have both a real-time dashboard and an audit log that need every single event.
In Kafka, broadcast is implicit. Multiple consumer groups reading from the same topic each get every message independently. No special configuration needed. Each consumer group tracks its own offsets and processes at its own pace.
The key thing to understand: broadcast does not copy data. Kafka stores the messages once, and each consumer group simply maintains a separate cursor. This is one of Kafka’s architectural strengths - fan-out via consumer groups is essentially free in terms of storage.
Partition-Based Fan-Out
Sometimes fan-out happens at the partition level. You configure your producer to route events to specific partitions using a custom partitioner, and then assign different consumers to different partition ranges. This is less common than content-based routing but useful when you need strict ordering within a subset of your data.
For example, in a multi-tenant system, you might hash on tenant_id so that all events for a given tenant land in the same partition. Downstream, each consumer handles a specific set of tenants, giving you both parallelism and per-tenant ordering.
What Fan-In Actually Looks Like
Fan-in is the inverse: multiple input streams merging into a single output stream. The “many-to-one” pattern. Think of CDC streams from PostgreSQL, MySQL, and MongoDB all feeding into a unified analytics topic. Or regional event streams from US, EU, and APAC clusters merging into a global view.
Fan-in has its own set of flavors, each with different trade-offs.
Union
Union is the simplest form - take events from multiple sources and write them all to the same destination, preserving every event. No deduplication, no aggregation, just concatenation.
In Flink SQL:
INSERT INTO unified_events
SELECT event_id, event_type, payload, event_time, 'postgres' as source_db
FROM postgres_cdc_stream
UNION ALL
SELECT event_id, event_type, payload, event_time, 'mysql' as source_db
FROM mysql_cdc_stream;
UNION ALL keeps all rows, including duplicates. This is usually what you want in streaming because deduplication across distributed sources is expensive and often unnecessary.
Merge with Schema Alignment
Real-world fan-in rarely involves sources with identical schemas. Your PostgreSQL table might have created_at as a timestamp, while MySQL uses creation_date as a datetime string. Merging these requires a transformation layer that normalizes schemas before writing to the unified stream.
This is where Flink SQL really shines. You can project, cast, and rename fields inline:
INSERT INTO unified_customers
SELECT
CAST(id AS STRING) as customer_id,
name,
email,
CAST(created_at AS TIMESTAMP(3)) as created_time,
'postgres' as origin
FROM postgres_customers
UNION ALL
SELECT
customer_ref as customer_id,
full_name as name,
email_address as email,
CAST(creation_date AS TIMESTAMP(3)) as created_time,
'mysql' as origin
FROM mysql_customers;
The unified topic has a clean, consistent schema regardless of the source. Downstream consumers never need to know that the data came from different databases with different naming conventions.
Coalesce
Coalesce goes a step further than union - it merges streams and resolves conflicts. When the same entity appears in multiple source streams, coalesce picks the “winner” based on a rule: latest timestamp, highest priority source, or a custom merge function.
This pattern is common in multi-region architectures where the same customer record might be modified in different regions. You need a single, consistent view. Flink’s windowing and deduplication features handle this:
INSERT INTO canonical_customers
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY event_time DESC
) as row_num
FROM unified_customers
)
WHERE row_num = 1;
This keeps only the most recent event for each customer, regardless of which source stream it came from.
Practical Architectures
These patterns are not academic exercises. They show up in nearly every production streaming system. Here are two architectures where fan-out and fan-in work together.
Multi-Tenant Data Isolation
A SaaS platform captures events from all tenants into a single Kafka topic. Downstream, different tenants have different requirements: some want their data in Snowflake, others in BigQuery, some need real-time Elasticsearch indexing.
The pipeline looks like this:
- Ingest - All tenant events land in a single
eventstopic, partitioned bytenant_id. - Fan-out - A Flink job reads the unified topic and routes events to tenant-specific topics based on configuration rules. High-value tenants get dedicated topics; smaller tenants share pooled topics.
- Destination routing - Each tenant-specific or pooled topic feeds the appropriate destination connector.
This pattern gives you the operational simplicity of a single ingest pipeline with the flexibility of per-tenant routing. Streamkap’s connector architecture is built around exactly this kind of topology - a single CDC source feeding multiple destinations through configurable routing.
Microservice Event Aggregation
In a microservice architecture, each service emits its own event stream: order-events, payment-events, inventory-events, shipping-events. An analytics team needs a unified view across all services to build dashboards and run queries.
- Fan-in - A Flink job reads from all four service topics, normalizes the schemas, and writes to a single
business-eventstopic. - Enrichment - Another Flink job joins the unified stream with reference data (customer profiles, product catalog) to produce enriched events.
- Fan-out - The enriched stream fans out to Snowflake for analytics, Elasticsearch for search, and a real-time dashboard service.
This creates the “diamond topology” - fan-in at the top, processing in the middle, fan-out at the bottom. It is one of the most common and most effective streaming architectures because it keeps the pipeline easy to reason about while supporting diverse downstream needs.
Ordering Considerations
Ordering is where fan-out and fan-in get tricky. Let’s be direct about what you can and cannot guarantee.
Within a single Kafka partition, ordering is guaranteed. Messages are appended in order, and consumers read them in order. This is Kafka’s fundamental contract.
Across partitions, there is no ordering guarantee. Messages in partition 0 and partition 1 are processed independently. If you need ordering for a specific entity (all events for customer 12345), you must ensure all events for that entity land in the same partition. Use the entity key as your Kafka message key.
In fan-out, ordering within a partition key is preserved as long as your routing does not change the key. If you split a transactions topic into transactions-us and transactions-eu, and you keep the original message keys, then all events for a given transaction ID stay ordered within their regional topic.
In fan-in, global ordering across merged streams is impossible and almost never needed. What you typically need is per-entity ordering - and you get that by routing on entity key. When merging streams from PostgreSQL and MySQL, if both have records for customer 12345, make customer_id the Kafka message key in the unified topic. All events for that customer land in the same partition and stay ordered.
If you truly need cross-source ordering (which is rare), you will need event-time-based windowing in Flink with watermarks. This lets you reorder events across sources based on their event timestamps, at the cost of added latency equal to your watermark delay.
Error Handling in Multi-Stream Topologies
Fan-out and fan-in create more failure points than a simple linear pipeline. A single slow or failing downstream in a fan-out can cause backpressure that affects all other downstreams. A single poisoned source in a fan-in can corrupt the merged stream.
Here are the patterns that work in production.
Dead Letter Queues
For fan-out, every downstream path should have its own dead letter queue (DLQ). When a message fails to process or write to a destination, it goes to the DLQ instead of blocking the entire pipeline. This isolates failures - a bad record heading to Elasticsearch does not block the Snowflake pipeline.
source-topic --> fan-out --> destination-1 (DLQ: destination-1-dlq)
--> destination-2 (DLQ: destination-2-dlq)
--> destination-3 (DLQ: destination-3-dlq)
Circuit Breakers
When a downstream destination goes down entirely (not just the occasional bad record), you need a circuit breaker. The fan-out job detects repeated failures to a specific destination and temporarily stops sending to it while continuing to serve other downstreams. Once the destination recovers, the circuit breaker resets and replay catches up from the last committed offset.
Schema Validation at the Boundary
For fan-in, validate schemas at the merge point, not downstream. If a source starts sending events with a changed schema, you want to catch that before it corrupts your unified stream. Flink SQL’s type system helps here - a CAST that fails will surface the problem immediately rather than letting malformed data propagate.
Independent Consumer Groups
In fan-out via Kafka consumer groups, each downstream operates independently. If the Elasticsearch consumer falls behind, it does not affect the Snowflake consumer. This is one of the big advantages of Kafka-based fan-out over application-level fan-out - the broker absorbs the backpressure per consumer group.
Putting It Together with Streamkap
Building these topologies from scratch requires managing Kafka clusters, deploying and monitoring Flink jobs, configuring connectors, and handling schema evolution across all of them. That is a lot of moving parts.
Streamkap handles the infrastructure layer so you can focus on the topology itself. CDC sources feed into managed Kafka, Flink SQL handles the routing and transformation logic, and pre-built destination connectors manage the fan-out to your data warehouse, search index, or application database. The fan-in and fan-out patterns described in this article are not special features - they are natural consequences of how Kafka and Flink work together, and Streamkap makes that combination accessible without the operational overhead.
Whether you are splitting a single CDC stream across ten destinations or merging five databases into one analytics pipeline, the building blocks are the same: topics, consumer groups, SQL transformations, and message keys. Master fan-out and fan-in, and you can model almost any data flow your architecture requires.