<--- Back to all resources
Apache Flink Use Cases: Real-World Stream Processing Examples
Explore real-world Apache Flink use cases across fraud detection, IoT, recommendations, and ETL - with a decision matrix for when Flink is the right choice.
Introduction
Apache Flink is the dominant open-source framework for stateful stream processing. Since its graduation from the Apache Software Foundation and its adoption as the backbone of major real-time data platforms at Alibaba, Netflix, Uber, and LinkedIn, Flink has become the reference implementation for what production-grade stream processing looks like.
But understanding when to use Flink - and equally, when not to - requires moving past the marketing material and into concrete use cases. This guide catalogs the most common real-world Flink applications, explains why Flink fits each problem, and provides a decision framework for teams evaluating whether Flink belongs in their architecture.
What Makes Flink Uniquely Capable
Before looking at use cases, it helps to understand the handful of capabilities that make Flink the right tool for certain problems:
- True event-time processing - Flink computes windows and aggregations based on when events occurred, not when they were processed. Critical for any use case involving latency, reordering, or replay.
- Exactly-once semantics - With checkpointing enabled, Flink guarantees that each event affects state exactly once, even after failures. Essential for financial and audit use cases.
- Keyed state - Flink maintains per-key state (e.g., per-user, per-device) that persists across events. This enables stateful joins and accumulations that are impractical in stateless systems.
- Rich windowing - Tumbling, sliding, session, and custom windows allow flexible time-based aggregations.
- CEP (Complex Event Processing) - The Flink CEP library supports pattern matching across event sequences with time constraints, enabling fraud and anomaly detection patterns.
Use Case 1: Fraud Detection
The Problem
A payment network processes millions of transactions per second. Fraudulent transactions often follow recognizable patterns: multiple rapid transactions from a single card, geographic impossibilities (card used in New York and London within minutes), or unusual merchant category combinations. Traditional batch fraud detection runs nightly - by the time a fraudulent pattern is flagged, dozens of transactions have already cleared.
The Streaming Solution
Flink maintains per-card state (recent transaction history, velocity counters, location) and applies fraud rules as each transaction arrives. A rule engine evaluates conditions across the state and either approves the transaction or flags it for review within 50–100ms.
Why Flink Fits
The Flink CEP library is purpose-built for this pattern. A pattern definition for velocity fraud looks like:
Pattern<Transaction, ?> velocityPattern = Pattern
.<Transaction>begin("first")
.where(new SimpleCondition<Transaction>() {
public boolean filter(Transaction t) {
return t.getAmount() > 100.0;
}
})
.next("second")
.where(new SimpleCondition<Transaction>() {
public boolean filter(Transaction t) {
return t.getAmount() > 100.0;
}
})
.within(Time.seconds(60));
This pattern detects two transactions over $100 from the same card within 60 seconds, using event time (the transaction timestamp), not processing time. Late-arriving events - common in distributed payment networks - are handled correctly via watermarks.
Keyed state partitioned by card_id means each card’s history is independently maintained across thousands of parallel task instances, providing horizontal scalability without coordination overhead.
Use Case 2: Real-Time Recommendations
The Problem
An e-commerce platform wants to update product recommendations the moment a user interacts with the site - not on the next page load based on a batch model run 24 hours ago. A user who just added a competitor product to their cart should see cross-sell recommendations immediately, not tomorrow.
The Streaming Solution
Flink consumes a stream of user interaction events (views, clicks, add-to-cart, purchases) and maintains a continuously updated feature vector per user. A recommendation model - either embedded in Flink or called via an external model-serving endpoint - produces updated recommendations that are written to a low-latency store (Redis, DynamoDB) consulted by the frontend.
Why Flink Fits
The session window is the key primitive here. User sessions are natural windows with activity-defined boundaries - a session ends after N minutes of inactivity. Flink’s session windows handle this without requiring a fixed time boundary:
-- Aggregate user activity within a session window
SELECT
user_id,
SESSION_START(event_time, INTERVAL '30' MINUTE) AS session_start,
SESSION_END(event_time, INTERVAL '30' MINUTE) AS session_end,
COLLECT(product_id) AS viewed_products,
COUNT(*) FILTER (WHERE event_type = 'purchase') AS purchases
FROM user_events
GROUP BY user_id, SESSION(event_time, INTERVAL '30' MINUTE);
Combined with keyed state that accumulates the user’s full interaction history (not just within the current session), Flink can power recommendation features that are aware of both session context and long-term behavior.
Use Case 3: IoT Sensor Processing
The Problem
A manufacturing plant runs 10,000 sensors monitoring temperature, pressure, vibration, and power consumption on industrial equipment. An anomaly - a temperature reading two standard deviations above the rolling average - can precede equipment failure by 15–30 minutes. If operators are notified in real time, they can take preventive action. If the alert arrives in the next morning’s batch report, it is too late.
The Streaming Solution
Flink aggregates sensor readings in tumbling windows (e.g., 1-minute averages) and applies statistical anomaly detection against rolling baselines maintained in keyed state. Alerts are emitted immediately when thresholds are crossed and routed to an alerting service or operator dashboard.
Why Flink Fits
IoT workloads generate extremely high event volumes from devices with variable connectivity. Devices that reconnect after a network outage may emit a burst of historical readings out of order. Flink’s watermark mechanism handles this gracefully:
WatermarkStrategy.<SensorReading>forBoundedOutOfOrderness(Duration.ofSeconds(30))
.withTimestampAssigner((event, timestamp) -> event.getSensorTimestamp());
This tells Flink to wait up to 30 seconds for out-of-order events before finalizing window results - matching the typical reconnection delay for the device fleet.
Per-sensor state maintains the rolling statistics needed for anomaly detection without requiring a database lookup on every event:
// Rolling mean and variance using Welford's online algorithm
ValueState<SensorBaseline> baselineState = getRuntimeContext()
.getState(new ValueStateDescriptor<>("baseline", SensorBaseline.class));
Use Case 4: Clickstream Analytics
The Problem
A media company wants to understand, in real time, which articles are trending, how users navigate across content, and where users abandon their reading sessions. Waiting for a daily batch job means editorial decisions lag audience behavior by 24 hours.
The Streaming Solution
Flink consumes page view events from the browser (via Kafka), aggregates them in sliding windows (e.g., article view count over the last 15 minutes, refreshed every minute), and writes results to a dashboard store queried by the editorial team.
-- Trending articles: sliding window count
SELECT
article_id,
COUNT(DISTINCT user_id) AS unique_readers,
COUNT(*) AS total_views,
AVG(CAST(scroll_depth AS DOUBLE)) AS avg_scroll_pct,
HOP_START(event_time, INTERVAL '1' MINUTE,
INTERVAL '15' MINUTE) AS window_start
FROM page_view_events
GROUP BY article_id,
HOP(event_time, INTERVAL '1' MINUTE, INTERVAL '15' MINUTE);
Why Flink Fits
The combination of sliding windows and high-cardinality grouping (one result per article per window) is exactly where Flink’s parallel, stateful execution shines. A single Flink job can handle millions of page views per minute and emit fresh trending scores every 60 seconds with sub-second end-to-end latency.
Use Case 5: Log Analytics and SIEM
The Problem
A security operations team needs to detect attack patterns - brute force login attempts, port scans, lateral movement - in real time across millions of log events per second from servers, firewalls, and applications. Storing everything in a SIEM and querying it periodically is too slow; attackers pivot in seconds.
The Streaming Solution
Flink processes raw log events, parses them into structured records, and applies security rules using CEP patterns. Alerts are emitted when patterns are detected and routed to the SOC team’s incident management system.
A brute force detection pattern:
Pattern<LoginEvent, ?> bruteForcePattern = Pattern
.<LoginEvent>begin("failed_logins")
.where(evt -> evt.getResult().equals("FAILED"))
.timesOrMore(5)
.greedy()
.next("success")
.where(evt -> evt.getResult().equals("SUCCESS"))
.within(Time.minutes(5));
Why Flink Fits
Security workloads require extremely low latency (alerts in seconds, not minutes), high throughput (every log event must be processed), and exactly-once guarantees (a missed alert is a security incident). Flink’s fault-tolerance through checkpointing ensures that a task manager failure does not result in unprocessed events.
Use Case 6: Customer 360 - Real-Time Profile Updates
The Problem
A retailer maintains customer profiles that combine data from multiple source systems: e-commerce orders, in-store purchases, loyalty program activity, and email engagement. Today these profiles are rebuilt nightly in a batch job. A customer who just made a high-value purchase is not recognized as a VIP customer until the next day.
The Streaming Solution
Flink joins change streams from multiple source systems on customer_id, maintaining a materialized customer profile in keyed state that reflects the current state across all systems. Profile updates are written to the CRM and personalization platform within seconds of the source event.
-- Real-time customer profile enrichment via stream-stream join
SELECT
o.customer_id,
o.order_id,
o.order_total,
c.loyalty_tier,
c.total_lifetime_value + o.order_total AS updated_ltv
FROM order_stream AS o
JOIN customer_profile_stream AS c
ON o.customer_id = c.customer_id
AND c.rowtime BETWEEN o.rowtime - INTERVAL '1' HOUR
AND o.rowtime + INTERVAL '1' HOUR;
Why Flink Fits
Joining multiple streams with different latencies (the loyalty system may be slower than the order system) requires temporal join semantics - matching events within a time window rather than requiring exact timestamp alignment. Flink’s interval joins and temporal table joins handle this correctly.
Use Case 7: Real-Time ETL and Data Pipeline Enrichment
The Problem
A data team needs to route, transform, filter, and enrich data as it moves from operational systems to the data warehouse and analytical stores. Today this is handled by batch dbt jobs - but the warehouse is always hours stale, and some consumers need fresh data.
The Streaming Solution
Flink acts as the transformation layer: reading from Kafka topics that carry CDC events, applying SQL transforms, joining with reference data (loaded from a database or broadcast variable), and writing enriched records to the target sink.
Platforms like Streamkap build on this pattern - using Flink as the processing engine to handle the routing, transformation, and delivery logic that teams would otherwise need to build and operate themselves.
-- Enrich order events with product catalog data
SELECT
o.order_id,
o.customer_id,
o.product_id,
p.product_name,
p.category,
p.supplier_id,
o.quantity,
o.quantity * p.unit_cost AS cost_of_goods
FROM order_events AS o
JOIN product_catalog FOR SYSTEM_TIME AS OF o.event_time AS p
ON o.product_id = p.product_id;
Use Case 8: Notification Engines
The Problem
A SaaS product needs to send real-time notifications: “Your export is ready,” “Your trial expires in 3 days,” “A teammate mentioned you.” These must be sent within seconds of the triggering event and must not be sent more than once (exactly-once delivery).
The Streaming Solution
Flink processes event streams, applies business rules (including deduplication and rate limiting using keyed state), and emits notification payloads to a delivery service (email, push, SMS). Keyed state per user_id enforces rate limits and tracks which notifications have already been sent.
When Flink Is - and Is Not - the Right Choice
Decision Matrix
| Scenario | Flink? | Alternative |
|---|---|---|
| Sub-second latency required | Yes | - |
| Stateful aggregations across event streams | Yes | - |
| Complex event pattern matching (CEP) | Yes | - |
| Exactly-once semantics required | Yes | - |
| Simple fan-out / routing with no state | No | Kafka Connect, simple consumer |
| Batch-style processing, latency > 5 min | No | Spark, dbt |
| Small data team, no platform engineering | No | Managed streaming (Streamkap, Confluent) |
| Prototyping / low volume | No | Kafka Streams (embedded), KSQL |
| ML model training (not serving) | No | Spark MLlib, Ray |
Flink Is Right When
- You need to maintain state across events and that state grows large (GB–TB per key)
- Event time correctness is non-negotiable (financial, audit, compliance workloads)
- You are joining multiple streams with different arrival times
- You need sub-second latency with fault tolerance
- You have a platform team that can operate Flink (or you are using a managed service)
Flink Is Wrong When
- Your pipeline is stateless (simple transforms, routing, enrichment from a cache)
- Your latency requirement is measured in minutes, not seconds
- You need to move fast and your team does not have Flink expertise
- Your data volume does not justify the operational overhead
Summary
Apache Flink’s combination of event-time processing, stateful computation, exactly-once semantics, and CEP capabilities makes it the strongest choice for a specific class of problems: real-time fraud detection, recommendation engines, IoT anomaly detection, streaming ETL, and Customer 360 profile maintenance.
The key to successful Flink adoption is matching the right use case to the tool. Not every streaming problem is a Flink problem. But when the use case requires stateful, fault-tolerant, low-latency processing at scale - Flink is the reference implementation.
For teams starting out, managed platforms that abstract Flink’s operational complexity (cluster management, checkpointing, upgrades) significantly reduce the time to production. Understand the patterns first, then decide how much of the infrastructure layer you want to own.
For related reading, see the guide on real-time data pipeline architecture and the overview of Change Data Capture for streaming ETL.