<--- Back to all resources

Engineering

February 25, 2026

12 min read

Flink SQL MATCH_RECOGNIZE: Complex Event Processing with SQL

Learn how to detect complex event patterns in streaming data using Flink SQL's MATCH_RECOGNIZE clause. Build fraud detection, anomaly detection, and sequence matching with SQL.

TL;DR: • MATCH_RECOGNIZE brings regex-like pattern matching to SQL streams - detect sequences of events like 'price drop followed by price spike' using declarative SQL. • Patterns are defined with row pattern variables, quantifiers (*, +, ?), and DEFINE conditions that specify what each pattern variable matches. • This is the SQL standard (ISO SQL:2016) approach to Complex Event Processing (CEP), replacing custom Java/Scala CEP libraries with readable SQL.

Streaming data is everywhere - transactions flowing through payment systems, sensor readings from factory floors, clickstreams across web applications. Individually, these events are unremarkable. But when specific sequences of events occur in a particular order, they tell a story: a fraud attempt in progress, a machine about to fail, or a user on the verge of converting.

Complex Event Processing (CEP) is the discipline of detecting these meaningful patterns within continuous event streams. Traditionally, building CEP required custom application code using specialized libraries. Flink SQL’s MATCH_RECOGNIZE clause changes that equation entirely, bringing regex-like pattern matching directly into SQL. If your team already knows SQL, you can now write sophisticated pattern detection logic without a single line of Java or Scala.

MATCH_RECOGNIZE Syntax Breakdown

The MATCH_RECOGNIZE clause is appended to a standard FROM clause and contains several sub-clauses that work together. Here is the full structure:

SELECT *
FROM MyStream
MATCH_RECOGNIZE (
  PARTITION BY user_id
  ORDER BY event_time
  MEASURES
    A.event_time AS start_time,
    LAST(B.event_time) AS end_time,
    A.price AS start_price,
    LAST(B.price) AS end_price
  ONE ROW PER MATCH
  AFTER MATCH SKIP PAST LAST ROW
  PATTERN (A B+ C)
  DEFINE
    A AS A.price > 100,
    B AS B.price > LAST(B.price, 1) OR B.price > A.price,
    C AS C.price < LAST(B.price)
)

Each sub-clause plays a distinct role:

  • PARTITION BY divides the stream into independent groups. Pattern matching operates within each partition separately, just like a window function’s PARTITION BY. In the example above, patterns are detected per user.
  • ORDER BY establishes the logical ordering of rows within each partition. For event streams, this is almost always an event timestamp. Flink requires this to know which row comes “before” or “after” another.
  • MEASURES defines the output columns. You extract values from matched rows using the pattern variable names (A, B, C) and aggregate or navigation functions. This is what you actually see in your query results.
  • PATTERN specifies the sequence of row pattern variables using regex-like syntax. (A B+ C) means: one row matching variable A, followed by one or more rows matching variable B, followed by one row matching variable C.
  • DEFINE provides the boolean conditions that determine whether a given row qualifies as a particular pattern variable. Any variable not defined in DEFINE matches every row (acting as a wildcard).

This separation of concerns - what to look for (PATTERN), how to identify each step (DEFINE), and what to output (MEASURES) - makes MATCH_RECOGNIZE both powerful and readable.

Pattern Variables and Quantifiers

Pattern variables are the building blocks of your match expression. Each variable name (A, B, C, or any identifier you choose) represents a category of row that you define with a boolean condition.

Quantifiers

Quantifiers control how many consecutive rows a variable can match:

QuantifierMeaningExample
(none)Exactly one rowA matches one row
*Zero or more rowsA* matches zero or more consecutive rows
+One or more rowsA+ matches one or more consecutive rows
?Zero or one rowA? matches optionally one row
{n}Exactly n rowsA{3} matches exactly three rows
{n,m}Between n and m rowsA{2,5} matches two to five rows
{n,}At least n rowsA{3,} matches three or more rows
{,m}At most m rowsA{,4} matches up to four rows

By default, quantifiers are greedy - they consume as many rows as possible while still allowing the overall pattern to match. Appending a ? to any quantifier makes it reluctant (lazy), consuming as few rows as possible. For example, A+? matches the minimum number of rows that still satisfies the pattern.

Combining Variables

Variables can be combined in sequences, alternations, and groups:

-- Sequence: A then B then C
PATTERN (A B C)

-- Alternation: A followed by either B or C
PATTERN (A (B | C))

-- Grouped quantifier: A then one or more (B followed by C) pairs
PATTERN (A (B C)+)

This composability lets you express arbitrarily complex event sequences using familiar regex concepts.

Row Pattern Navigation Functions

Inside both the DEFINE and MEASURES clauses, you can reference values from other rows in the current match using navigation functions:

  • LAST(variable.column, n) - Returns the value of a column from the nth-to-last row mapped to the variable. LAST(B.price, 0) is the current B row’s price; LAST(B.price, 1) is the previous B row’s price.
  • FIRST(variable.column, n) - Returns the value from the nth row mapped to the variable, counting from the first match.
  • PREV(column, n) - Returns the value from the row that is n positions before the current row in the partition ordering, regardless of pattern variable assignment.
  • NEXT(column, n) - Returns the value from the row n positions after the current row. Note that NEXT is only available in certain contexts and not universally supported.

These functions are essential for writing conditions that compare the current row to previous rows. For instance, detecting a price increase requires comparing the current price to the previous price:

DEFINE
  UP AS UP.price > PREV(UP.price, 1)

Output Modes

MATCH_RECOGNIZE supports two output modes that control how many rows are emitted for each successful match:

ONE ROW PER MATCH

Produces a single summary row per match. This is the most common mode and works well when you want aggregated results - for example, the start time, end time, and total value of a matched pattern.

ONE ROW PER MATCH
MEASURES
  FIRST(A.event_time) AS pattern_start,
  LAST(C.event_time) AS pattern_end,
  COUNT(B.amount) AS num_events

ALL ROWS PER MATCH

Produces one output row for every input row that participated in the match. This is useful for debugging or when you need to tag each individual event with its role in the matched pattern.

ALL ROWS PER MATCH
MEASURES
  CLASSIFIER() AS pattern_variable,
  MATCH_NUMBER() AS match_id

The CLASSIFIER() function returns the name of the pattern variable that the current row was mapped to (e.g., ‘A’, ‘B’, or ‘C’), and MATCH_NUMBER() returns a sequential identifier for each match.

After Match Strategy

Once a pattern is matched, Flink needs to know where to resume scanning for the next match. The AFTER MATCH clause controls this behavior:

  • SKIP PAST LAST ROW (default) - Resumes scanning from the row after the last row in the completed match. Matched rows are never part of another match.
  • SKIP TO NEXT ROW - Resumes scanning from the row immediately after the first row of the completed match. This allows overlapping matches, which is important when patterns can share starting events.
  • SKIP TO FIRST variable - Resumes from the first row mapped to the specified variable.
  • SKIP TO LAST variable - Resumes from the last row mapped to the specified variable.

Choosing the right strategy depends on your use case. For non-overlapping alerts (like fraud notifications), SKIP PAST LAST ROW prevents duplicate alerts. For continuous monitoring where every potential match matters (like price pattern scanning), SKIP TO NEXT ROW ensures no pattern is missed.

Practical Examples

Detecting Price V-Shaped Patterns

A classic financial pattern: price drops for several consecutive ticks, then recovers. This V-shape can signal a buying opportunity or a market correction.

SELECT *
FROM stock_ticks
MATCH_RECOGNIZE (
  PARTITION BY symbol
  ORDER BY tick_time
  MEASURES
    START_ROW.tick_time AS start_time,
    LAST(DOWN.tick_time) AS bottom_time,
    LAST(UP.tick_time) AS recovery_time,
    START_ROW.price AS start_price,
    LAST(DOWN.price) AS bottom_price,
    LAST(UP.price) AS recovery_price
  ONE ROW PER MATCH
  AFTER MATCH SKIP PAST LAST ROW
  PATTERN (START_ROW DOWN+ UP+)
  DEFINE
    DOWN AS DOWN.price < PREV(DOWN.price, 1),
    UP AS UP.price > PREV(UP.price, 1)
) AS matched;

This query partitions by stock symbol, finds sequences where prices decline for one or more ticks and then rise for one or more ticks, and outputs the timing and prices at each phase of the V-shape.

Fraud Detection: Rapid Small Transactions Followed by Large Withdrawal

Financial fraud often follows a recognizable pattern: several small “test” transactions in quick succession, followed by a large withdrawal. MATCH_RECOGNIZE makes this straightforward to express:

SELECT *
FROM transactions
MATCH_RECOGNIZE (
  PARTITION BY account_id
  ORDER BY txn_time
  MEASURES
    FIRST(SMALL.txn_time) AS first_small_txn,
    LAST(SMALL.txn_time) AS last_small_txn,
    BIG.txn_time AS large_txn_time,
    BIG.amount AS large_amount,
    COUNT(SMALL.amount) AS num_small_txns
  ONE ROW PER MATCH
  AFTER MATCH SKIP PAST LAST ROW
  PATTERN (SMALL{3,} BIG)
  DEFINE
    SMALL AS SMALL.amount < 5.00
      AND SMALL.txn_time > PREV(SMALL.txn_time, 1) - INTERVAL '10' MINUTE,
    BIG AS BIG.amount > 500.00
      AND BIG.txn_time < LAST(SMALL.txn_time) + INTERVAL '30' MINUTE
) AS fraud_alert;

The pattern requires at least three small transactions (under $5) occurring within 10 minutes of each other, followed by a large transaction (over $500) within 30 minutes of the last small one. When this pattern matches, the query emits an alert with all the relevant details.

User Funnel Analysis: Signup, Activation, Purchase

Product teams often need to identify users who complete a specific journey through their application. MATCH_RECOGNIZE can detect this funnel in real time:

SELECT *
FROM user_events
MATCH_RECOGNIZE (
  PARTITION BY user_id
  ORDER BY event_time
  MEASURES
    SIGNUP.event_time AS signup_time,
    ACTIVATE.event_time AS activation_time,
    PURCHASE.event_time AS purchase_time,
    PURCHASE.event_time - SIGNUP.event_time AS time_to_convert
  ONE ROW PER MATCH
  AFTER MATCH SKIP PAST LAST ROW
  PATTERN (SIGNUP ANY_EVENT* ACTIVATE ANY_EVENT* PURCHASE)
  DEFINE
    SIGNUP AS SIGNUP.event_type = 'signup',
    ACTIVATE AS ACTIVATE.event_type = 'activation',
    PURCHASE AS PURCHASE.event_type = 'purchase',
    ANY_EVENT AS ANY_EVENT.event_type NOT IN ('signup', 'activation', 'purchase')
) AS funnel;

The ANY_EVENT* variable acts as a wildcard, allowing any number of intermediate events between the funnel stages. This captures the full conversion journey regardless of how many other actions the user takes between steps.

IoT Sensor Alert: Temperature Rising Above Threshold

Manufacturing and infrastructure monitoring often requires detecting sustained anomalies - not just a single spike, but a pattern that indicates a real problem:

SELECT *
FROM sensor_readings
MATCH_RECOGNIZE (
  PARTITION BY sensor_id
  ORDER BY reading_time
  MEASURES
    FIRST(HOT.reading_time) AS alert_start,
    LAST(HOT.reading_time) AS alert_end,
    COUNT(HOT.temperature) AS consecutive_readings,
    AVG(HOT.temperature) AS avg_temperature
  ONE ROW PER MATCH
  AFTER MATCH SKIP PAST LAST ROW
  PATTERN (NORMAL HOT{5,})
  DEFINE
    NORMAL AS NORMAL.temperature <= 80.0,
    HOT AS HOT.temperature > 80.0
) AS overheat_alert;

This detects five or more consecutive readings above 80 degrees following a normal reading. The normal reading anchors the start of the pattern, ensuring that the alert fires at the transition from safe to dangerous conditions rather than repeatedly within an ongoing overheat event.

Performance Considerations

MATCH_RECOGNIZE maintains state for every active partial match. In high-throughput streams, this state can grow significantly, so understanding the performance implications is important.

Pattern complexity drives state size. Patterns with unbounded quantifiers (A+, B*) on high-cardinality partitions can accumulate large amounts of state as Flink tracks many open partial matches simultaneously. Where possible, use bounded quantifiers (A{1,10}) to cap the maximum match length.

Timeouts prevent unbounded state growth. If a pattern never completes, the partial match state grows indefinitely. Flink does not natively support pattern timeouts in MATCH_RECOGNIZE the way the CEP library does, so you should design your DEFINE conditions to include temporal constraints. For example, adding a condition like B.event_time < A.event_time + INTERVAL '1' HOUR ensures that a partial match is abandoned if the time window elapses.

Partition cardinality matters. Each distinct partition key maintains its own pattern matching state. Partitioning by a high-cardinality field (like individual user IDs across millions of users) means millions of independent state machines. Consider whether a coarser partitioning scheme works for your use case.

After match strategy affects throughput. SKIP TO NEXT ROW creates more potential matches than SKIP PAST LAST ROW because it evaluates overlapping windows. For alerting use cases where you only need the first match, SKIP PAST LAST ROW is more efficient.

When running MATCH_RECOGNIZE queries at scale, a managed Flink platform like Streamkap handles the underlying state management, checkpointing, and resource allocation automatically, so you can focus on writing correct patterns rather than tuning infrastructure.

MATCH_RECOGNIZE vs Window Functions

Both MATCH_RECOGNIZE and window functions operate on ordered sequences of rows, but they solve fundamentally different problems.

Window functions compute aggregates over a fixed frame of rows. They answer questions like “What is the moving average of the last 10 prices?” or “What is the maximum temperature in the last hour?” The frame is defined by size or time, and every row in the window contributes to the result.

MATCH_RECOGNIZE detects variable-length sequences that satisfy structural constraints. It answers questions like “Did the price drop three times in a row and then spike?” The number of rows in a match is not predetermined - it depends on the data. The pattern may span 4 rows or 40 rows depending on how the events unfold.

Here is a practical comparison:

CriterionWindow FunctionsMATCH_RECOGNIZE
Frame definitionFixed size or time intervalVariable, determined by pattern
Row selectionAll rows in frameOnly rows matching DEFINE conditions
OutputOne result per input rowOne result per match (or per matched row)
Ordering dependencyAggregates within frameDetects sequences across rows
Use caseRunning averages, rankings, lag/leadFraud patterns, session sequences, trend detection

If your question is about aggregating within a known window, use window functions. If your question is about detecting a specific sequence of conditions whose length is unknown in advance, MATCH_RECOGNIZE is the right tool.

In many real-world pipelines, you will use both. A window function might compute a rolling average, and MATCH_RECOGNIZE might detect when that rolling average follows a specific trajectory - combining the strengths of both approaches within Streamkap’s managed Flink environment.

Wrapping Up

MATCH_RECOGNIZE brings the full power of Complex Event Processing to teams that think in SQL. Instead of writing imperative pattern detection code in Java or Scala, you declare the pattern you are looking for, define what each step means, and let Flink’s runtime handle the matching, state management, and output. The result is pattern detection logic that is readable, maintainable, and testable - three qualities that matter when that logic is standing between your business and fraud, equipment failure, or missed revenue.