<--- Back to all resources

Engineering

February 25, 2026

12 min read

Real-Time Fraud Detection with Apache Flink

How to build a real-time fraud detection system using Apache Flink. Covers rule-based detection, windowed aggregations, pattern matching, and ML model scoring.

TL;DR: • Fraud detection requires sub-second response times that only stream processing can deliver. • Flink provides the building blocks: windowed aggregations for velocity checks, CEP for pattern matching, and async I/O for ML model scoring. • A layered approach combining rules, aggregations, and ML models catches more fraud than any single method. • CDC from payment and account databases feeds the pipeline with the transactional context needed for accurate detection.

A fraudulent transaction takes less than a second to execute. Your detection system needs to be faster.

That simple constraint shapes every architectural decision in a fraud detection pipeline. Batch processing, where you run queries against yesterday’s transactions overnight, only tells you what already happened. The money has moved. The account is drained. You are left writing incident reports instead of blocking transactions.

Stream processing flips this model. Every transaction is evaluated the moment it arrives, and a decision is returned before the payment settles. Apache Flink is particularly well-suited for this work because it was built from the ground up for stateful stream processing, giving you the primitives you need: time windows, pattern matching, exactly-once state management, and low-latency event handling.

This guide walks through the building blocks of a Flink-based fraud detection system, from simple rules to ML model scoring, with code examples you can adapt for your own pipeline.

Why Latency Kills You in Fraud Detection

The economics of fraud detection are brutally simple. Every millisecond of added latency between a transaction event and your accept/reject decision is a window where money can disappear. Studies from the payments industry consistently show that fraud losses scale linearly with detection latency. Detect in 50ms, and you block the transaction before authorization. Detect in 50 minutes, and you are filing a chargeback.

Batch systems are not just slow. They are architecturally incapable of solving this problem. A nightly batch job processes transactions from hours ago. Even micro-batch systems running every 5 minutes introduce enough delay for a determined attacker to execute dozens of fraudulent transactions and vanish.

Flink processes events one at a time (or in very small groups) with true streaming semantics. There is no waiting for a batch window to close. Each transaction hits your detection logic within milliseconds of arriving on the stream.

Before diving into specific detection strategies, it helps to understand the four Flink primitives that make real-time fraud detection possible.

Keyed State

Flink lets you maintain per-key state that survives failures. For fraud detection, this means you can track per-user metrics (transaction counts, running totals, last-known location) without an external database lookup. The state lives in-process, co-located with your computation, which keeps latency in the microsecond range.

Time Windows

Windows let you aggregate events over defined time intervals. A sliding window of 5 minutes, advancing every 30 seconds, can track how many transactions a given card has processed recently. Tumbling windows can compute hourly spend totals. Session windows can group bursts of activity together. All of these are first-class concepts in Flink, not something you need to build from scratch.

Complex Event Processing (CEP)

Flink’s CEP library and its SQL-level equivalent, MATCH_RECOGNIZE, let you define patterns over sequences of events. This is where you express rules like “three failed logins followed by a password reset followed by a high-value transfer within 10 minutes.” Pattern matching over event streams is something that would require hundreds of lines of custom stateful code without a dedicated abstraction.

Async I/O

Not everything can live in-process. Sometimes you need to call an external ML model, query a blocklist service, or look up device reputation scores. Flink’s async I/O operator lets you make these external calls without blocking the processing pipeline. Requests go out in parallel, and results are folded back into the stream as they return.

The simplest fraud detection layer is a set of hard rules. These are deterministic checks that flag obviously suspicious activity. They are fast to implement, easy to explain to compliance teams, and catch a surprising amount of fraud on their own.

Here is a Flink SQL query that flags any single transaction over $10,000, or any transaction from a country that does not match the cardholder’s registered country:

SELECT
  transaction_id,
  card_id,
  amount,
  merchant_country,
  cardholder_country,
  CASE
    WHEN amount > 10000 THEN 'HIGH_VALUE'
    WHEN merchant_country <> cardholder_country THEN 'CROSS_BORDER'
  END AS rule_triggered
FROM transactions
WHERE amount > 10000
   OR merchant_country <> cardholder_country;

This is the kind of rule that runs in microseconds per event. No external calls, no windowed aggregation, just a filter over the stream. In practice, you would have dozens of these rules running in parallel, each one writing flagged transactions to a side output for further evaluation.

The limitation of pure rules is that they are static. A $10,000 threshold might be normal for a corporate card but wildly suspicious for a student’s debit card. That is where the next layer comes in.

Layer 2: Windowed Aggregations for Velocity Checks

Velocity checks answer the question: “Is this user doing something unusual relative to their recent behavior?” This requires maintaining running aggregations over time windows.

A classic velocity check counts transactions per card over a sliding window:

SELECT
  card_id,
  COUNT(*) AS tx_count,
  SUM(amount) AS tx_total,
  TUMBLE_END(event_time, INTERVAL '5' MINUTE) AS window_end
FROM transactions
GROUP BY
  card_id,
  TUMBLE(event_time, INTERVAL '5' MINUTE)
HAVING COUNT(*) > 5 OR SUM(amount) > 5000;

This query groups transactions by card into 5-minute tumbling windows and flags any card with more than 5 transactions or more than $5,000 in total spend within that window.

You can layer multiple windows for different time horizons. A 5-minute window catches rapid-fire card testing. A 1-hour window catches slower distributed attacks. A 24-hour window catches gradual account draining. Each window runs independently, and their outputs can be combined downstream.

Flink manages the windowed state automatically. When a window expires, the state is cleaned up. When a checkpoint happens, the in-flight window state is persisted. You do not need to worry about memory leaks or state corruption.

Layer 3: Pattern Matching with MATCH_RECOGNIZE

Some fraud patterns are sequential. They are not about any single event being suspicious, but about a specific sequence of events that, taken together, indicate an account takeover or a social engineering attack.

Flink SQL’s MATCH_RECOGNIZE clause lets you express these patterns as regular expressions over event streams. Here is an example that detects a common account takeover pattern: multiple failed login attempts, followed by a successful login, followed by a change of address, followed by a high-value purchase.

SELECT *
FROM account_events
MATCH_RECOGNIZE (
  PARTITION BY user_id
  ORDER BY event_time
  MEASURES
    FIRST(A.event_time) AS first_failed_login,
    B.event_time AS successful_login,
    C.event_time AS address_change,
    D.amount AS purchase_amount
  ONE ROW PER MATCH
  AFTER MATCH SKIP PAST LAST ROW
  PATTERN (A{3,} B C D)
  DEFINE
    A AS A.event_type = 'LOGIN_FAILED',
    B AS B.event_type = 'LOGIN_SUCCESS',
    C AS C.event_type = 'ADDRESS_CHANGE',
    D AS D.event_type = 'PURCHASE' AND D.amount > 1000
) AS fraud_pattern;

The PATTERN (A{3,} B C D) expression reads as: three or more failed logins (A), followed by a successful login (B), followed by an address change (C), followed by a purchase over $1,000 (D). Flink evaluates this pattern continuously across the partitioned stream, maintaining the necessary state per user.

Without MATCH_RECOGNIZE, implementing this same logic would require manually managing state machines for every user. You would need to track which “phase” each user is in, handle timeouts, manage out-of-order events, and clean up expired state. The declarative SQL approach reduces hundreds of lines of Java or Python to a single query.

Layer 4: ML Model Scoring with Async I/O

Rules and patterns catch known fraud signatures. Machine learning models catch the unknown ones. A trained model can score each transaction’s fraud probability based on hundreds of features: spending patterns, device fingerprints, time-of-day distributions, merchant category codes, and more.

The challenge is that ML model inference adds latency. If your model is served behind an HTTP endpoint (as most production ML systems are), each scoring call takes 5-50ms depending on your infrastructure. Calling models synchronously would destroy your pipeline’s throughput.

Flink’s AsyncDataStream operator solves this. It lets you fire off model scoring requests in parallel without blocking event processing:

AsyncDataStream.unorderedWait(
    transactionStream,
    new FraudModelScoringFunction(),
    100, TimeUnit.MILLISECONDS,
    1000  // max concurrent requests
)

The FraudModelScoringFunction makes an async HTTP call to your model serving endpoint. Flink manages the concurrency, handles timeouts, and reorders results as needed. You can have up to 1,000 scoring requests in flight simultaneously without blocking the main processing thread.

The model returns a risk score (say, 0.0 to 1.0), which you then combine with the outputs from your rule-based and pattern-based layers. A transaction might pass every hard rule but still get flagged because the ML model assigns it a 0.92 risk score based on subtle feature combinations no human would catch.

Feeding the Pipeline: CDC for Account Context

Fraud detection does not happen in isolation. A transaction by itself is just a set of numbers. The context around it is what makes detection accurate. Was the shipping address recently changed? Did the user just reset their password? Is this a new device?

This context lives in your operational databases: account tables, address history, device registrations, authentication logs. Getting this data into your Flink pipeline in real time requires change data capture (CDC).

CDC captures every insert, update, and delete from your databases and streams them as events. When a user changes their shipping address in your PostgreSQL account database, that change shows up in your Flink pipeline within seconds, not hours.

This is where a platform like Streamkap fits naturally. Streamkap streams CDC data from your databases (PostgreSQL, MySQL, MongoDB, and others) directly into Kafka topics that your Flink jobs consume. You do not need to manage Debezium connectors, configure replication slots, or worry about schema evolution. The CDC events arrive as structured records that you can join with your transaction stream in Flink.

Here is what that join looks like in Flink SQL:

SELECT
  t.transaction_id,
  t.amount,
  t.card_id,
  a.address_changed_at,
  a.password_reset_at,
  TIMESTAMPDIFF(MINUTE, a.address_changed_at, t.event_time) AS mins_since_address_change,
  TIMESTAMPDIFF(MINUTE, a.password_reset_at, t.event_time) AS mins_since_password_reset
FROM transactions t
JOIN account_changes a
  ON t.user_id = a.user_id
WHERE TIMESTAMPDIFF(MINUTE, a.address_changed_at, t.event_time) < 60
   OR TIMESTAMPDIFF(MINUTE, a.password_reset_at, t.event_time) < 30;

This query enriches each transaction with account change context. A large purchase that happens 5 minutes after an address change is far more suspicious than the same purchase from a long-standing account with no recent changes. Without CDC feeding this data in real time, you would be making fraud decisions blind to the most important signals.

Putting It All Together: Pipeline Architecture

A production fraud detection pipeline layers all four detection strategies into a single Flink application. Here is how the pieces connect:

Data Sources:

  • Transaction events from your payment processor (via Kafka)
  • CDC streams from account, address, and device databases (via Streamkap to Kafka)
  • Login and authentication events (via Kafka)
  • Device fingerprint data (via API enrichment)

Processing Layers:

  1. Enrichment — Join transaction events with CDC streams to add account context (address changes, password resets, new devices)
  2. Rule engine — Apply deterministic rules (amount thresholds, blocked countries, card-not-present checks)
  3. Velocity checks — Windowed aggregations over multiple time horizons (5 min, 1 hour, 24 hours)
  4. Pattern matchingMATCH_RECOGNIZE queries for multi-step attack sequences
  5. ML scoring — Async I/O calls to model serving endpoints for risk scores
  6. Decision — Combine signals from all layers into a final accept/review/reject decision

Output:

  • Accept decisions route back to the payment processor in under 100ms
  • Review decisions go to a case management queue for human analysts
  • Reject decisions block the transaction and trigger customer notification
  • All decisions (with full feature vectors) are written to a data warehouse for model retraining

The key insight is that no single layer is sufficient on its own. Rules catch the obvious cases but miss novel attacks. ML models catch subtle patterns but are opaque and hard to audit. Velocity checks catch acceleration but miss slow-and-low attacks. Pattern matching catches known sequences but not statistical anomalies. Layering them together gives you defense in depth.

Operational Considerations

State Size and TTLs

Fraud detection state can grow large. If you are tracking per-user metrics across multiple time windows, you are maintaining state for every active user. Configure TTLs (time-to-live) on your keyed state so that inactive users get cleaned up automatically. A 30-day TTL is typically sufficient. Without TTLs, your state will grow without bound and eventually cause out-of-memory failures.

Checkpointing and Exactly-Once

Fraud detection has zero tolerance for duplicate processing. A transaction that gets scored twice might generate duplicate alerts or, worse, duplicate blocks on legitimate customers. Configure Flink with exactly-once checkpointing and use idempotent sinks. Checkpoint intervals of 30-60 seconds provide a good balance between recovery speed and overhead.

Latency Monitoring

Track p50, p95, and p99 latencies from event ingestion to decision output. Your SLA is probably under 100ms for the full pipeline. If p99 latency creeps above that, you need to investigate. Common culprits: slow ML model endpoints, state backend compaction, or garbage collection pauses.

Model Updates

ML models go stale. Fraud patterns evolve. You need a mechanism to deploy updated models without restarting the Flink job. One approach is to have your async scoring function periodically check for new model versions behind a feature flag or load balancer. Another is to use Flink’s broadcast state to push model configuration changes into the running job.

Getting Started

If you are building this from scratch, start small. Stand up a Flink job that reads transaction events from Kafka, applies a handful of hard rules, and writes flagged transactions to a separate topic. Once that is running reliably, add velocity checks. Then add CDC enrichment using Streamkap to stream account changes from your database. Then add pattern matching. Then integrate ML scoring.

Each layer can be developed, tested, and deployed independently. Flink’s operator chaining and task slot sharing mean that adding layers does not proportionally increase resource consumption. A well-tuned fraud detection pipeline processing 10,000 transactions per second can run comfortably on a modest Flink cluster.

The tooling exists. The patterns are well-understood. The only question is how quickly you can get your first detection rule into production and start iterating from there.