<--- Back to all resources
Flink SQL Tumbling Windows Explained with Examples
Learn how tumbling windows work in Flink SQL for fixed-interval stream aggregations. Practical examples for counting, summing, and grouping events over time.
Tumbling windows are one of the most fundamental building blocks in stream processing. They take a continuous, unbounded flow of events and slice it into fixed-size, non-overlapping time intervals, making it possible to compute aggregations that would otherwise have no well-defined scope. If you have ever written a GROUP BY date_trunc('hour', event_time) query in batch SQL, you already understand the motivation. Tumbling windows bring that same concept to data that never stops arriving.
This article walks through how tumbling windows work in Flink SQL, provides practical query examples you can adapt for your own pipelines, and covers the operational details that matter once you move to production. If you are looking for a broader overview of Flink SQL concepts including dynamic tables, joins, and other window types, see the Flink SQL complete guide.
Why do tumbling windows matter? In batch analytics, you can always re-scan the full dataset and recompute an aggregate. In streaming, the dataset is infinite. You need a mechanism that tells Flink when a group of events is “complete” so it can emit a result and release the associated state. Windows provide that mechanism, and tumbling windows are the simplest and most widely used variant.
How Tumbling Windows Work
A tumbling window assigns every incoming event to exactly one window based on the event’s timestamp. Each window has a fixed duration, and windows are contiguous with no gaps or overlaps. Once the window’s time interval passes and the watermark advances beyond its end, Flink fires the aggregation and emits the result.
Consider a 5-minute tumbling window applied to a stream of page-view events. The first window covers [00:00, 00:05), the second covers [00:05, 00:10), the third covers [00:10, 00:15), and so on. A page view with a timestamp of 00:03:42 falls into the first window. A page view at 00:07:15 falls into the second. No event ever straddles two windows.
Time axis:
|-- Window 1 --|-- Window 2 --|-- Window 3 --|
00:00 00:05 00:05 00:10 00:10 00:15
Event at 00:03:42 --> Window 1
Event at 00:07:15 --> Window 2
Event at 00:10:00 --> Window 3
This is fundamentally different from a batch GROUP BY on a time column. In batch mode, the query runs once over a finite dataset and returns all groups at the end. In streaming, Flink continuously assigns events to windows, maintains running aggregates in state, and emits results as each window closes. The computation is incremental and never-ending.
Flink SQL Syntax
Flink SQL uses the TUMBLE() table-valued function (TVF) to define tumbling windows. The TVF approach was introduced in Flink 1.13 and is the recommended way to write windowed queries. It replaces the older GROUP BY TUMBLE() syntax, which is deprecated.
The basic structure looks like this:
SELECT
window_start,
window_end,
<aggregations>
FROM TABLE(
TUMBLE(TABLE <source_table>, DESCRIPTOR(<time_column>), INTERVAL '<size>' <unit>)
)
GROUP BY window_start, window_end;
The three arguments to TUMBLE() are:
TABLE <source_table>— The input table or view. This must be preceded by theTABLEkeyword.DESCRIPTOR(<time_column>)— The time attribute column to window on. This can be an event-time column (with a watermark defined) or a processing-time column.INTERVAL '<size>' <unit>— The window size. Units includeSECOND,MINUTE,HOUR, andDAY.
After the TUMBLE() TVF, you group by window_start and window_end (and any additional grouping keys), then apply standard SQL aggregate functions in the SELECT clause.
Practical Examples
Per-Minute Event Counting
The simplest use case is counting events per time interval. This query counts the number of page views every minute, grouped by page URL:
SELECT
window_start,
window_end,
page_url,
COUNT(*) AS view_count
FROM TABLE(
TUMBLE(TABLE page_views, DESCRIPTOR(event_time), INTERVAL '1' MINUTE)
)
GROUP BY window_start, window_end, page_url;
Each output row tells you how many times a given page was viewed within a specific one-minute interval. This is the kind of real-time metric you would feed into a monitoring dashboard or an alerting system.
Hourly Revenue Aggregation
For financial reporting, you often need to compute revenue totals on an hourly cadence. Assuming an orders table with an order_total column and an order_time event-time attribute:
SELECT
window_start,
window_end,
currency,
SUM(order_total) AS hourly_revenue,
COUNT(*) AS order_count,
AVG(order_total) AS avg_order_value
FROM TABLE(
TUMBLE(TABLE orders, DESCRIPTOR(order_time), INTERVAL '1' HOUR)
)
GROUP BY window_start, window_end, currency;
This produces one row per currency per hour with the total revenue, order count, and average order value. The results can be written to a sink table in a data warehouse or pushed to a live dashboard.
Daily Active User Rollups
Tumbling windows are not limited to small intervals. A daily window works well for computing distinct user counts:
SELECT
window_start,
window_end,
COUNT(DISTINCT user_id) AS daily_active_users
FROM TABLE(
TUMBLE(TABLE user_activity, DESCRIPTOR(activity_time), INTERVAL '1' DAY)
)
GROUP BY window_start, window_end;
Keep in mind that COUNT(DISTINCT ...) requires Flink to maintain a set of all observed user IDs within each window. For very large cardinalities, this can consume significant state. If approximate counts are acceptable, consider using an approximate distinct count UDF.
Using Tumbling Windows with CDC Streams
One of the most powerful patterns is applying tumbling windows to change data capture (CDC) streams. When a database table is captured as a changelog stream, each insert, update, and delete becomes an event. You can aggregate these changes over time to build real-time materialized views.
For example, suppose you are streaming changes from an orders table via CDC and want to compute per-minute order counts by status:
CREATE TABLE orders_cdc (
order_id BIGINT,
status STRING,
amount DECIMAL(10,2),
updated_at TIMESTAMP(3),
WATERMARK FOR updated_at AS updated_at - INTERVAL '5' SECOND
) WITH (
'connector' = 'kafka',
'topic' = 'orders-cdc',
'format' = 'debezium-json',
...
);
SELECT
window_start,
window_end,
status,
COUNT(*) AS change_count,
SUM(amount) AS total_amount
FROM TABLE(
TUMBLE(TABLE orders_cdc, DESCRIPTOR(updated_at), INTERVAL '1' MINUTE)
)
GROUP BY window_start, window_end, status;
This query processes the raw CDC changelog and produces per-minute summaries of order changes by status. It is especially useful for monitoring database activity in real time, catching anomalies like sudden spikes in cancellations or refunds. Streamkap makes this pattern straightforward by providing managed Flink with pre-built CDC connectors, so you can go from database changes to windowed aggregations without setting up Kafka Connect, managing schema registries, or provisioning Flink clusters yourself.
Event Time vs Processing Time
Flink SQL supports two time semantics for windowing, and choosing the right one has a significant impact on correctness and behavior.
Event time uses a timestamp embedded in the event itself, representing when the event actually occurred. It requires a watermark strategy, which tells Flink how long to wait for late-arriving data before closing a window. Event time produces deterministic results: if you replay the same stream, you get the same output. This is the recommended choice for production workloads.
To use event time, you define a watermark in your table’s DDL:
CREATE TABLE sensor_readings (
sensor_id STRING,
temperature DOUBLE,
reading_time TIMESTAMP(3),
WATERMARK FOR reading_time AS reading_time - INTERVAL '10' SECOND
) WITH (...);
The WATERMARK FOR reading_time AS reading_time - INTERVAL '10' SECOND statement tells Flink that events may arrive up to 10 seconds late. Flink will keep windows open until the watermark passes the window’s end boundary.
Processing time uses the wall-clock time of the Flink operator when it processes the record. It is simpler because it requires no watermark configuration, but the results are non-deterministic. If you reprocess the same data, you get different results because the processing timestamps will differ. Use processing time only for low-latency monitoring where approximate results are acceptable and correctness on replay is not a requirement.
To use processing time, declare the column with the PROCTIME() function:
CREATE TABLE events (
event_id STRING,
payload STRING,
proc_time AS PROCTIME()
) WITH (...);
Window Output and Metadata
The TUMBLE() TVF exposes three metadata columns that you can use in your queries:
window_start— The inclusive start timestamp of the window.window_end— The exclusive end timestamp of the window.window_time— The window’s time attribute, which equalswindow_end - 1 millisecond. This column can be used as a time attribute for downstream operations like cascading window aggregations or window joins.
You must include window_start and window_end in your GROUP BY clause. You can also select window_time if you need to chain the output into another time-based operation:
SELECT
window_start,
window_end,
window_time,
sensor_id,
AVG(temperature) AS avg_temp
FROM TABLE(
TUMBLE(TABLE sensor_readings, DESCRIPTOR(reading_time), INTERVAL '5' MINUTE)
)
GROUP BY window_start, window_end, window_time, sensor_id;
The window_time column preserves the time attribute so you can feed this result into a second tumbling window (for example, computing hourly averages from 5-minute averages) without losing event-time semantics.
Performance Considerations
Tumbling window queries are generally efficient because Flink can clear state as soon as a window fires. Unlike session windows, which must track per-key gaps indefinitely, tumbling windows have a bounded lifecycle. However, there are still factors worth monitoring.
State size is driven by the number of distinct grouping keys multiplied by the window duration. A 1-minute window on a low-cardinality key produces small state. A 1-day window with COUNT(DISTINCT user_id) across millions of users produces large state. If state grows beyond available memory, Flink spills to its RocksDB state backend on disk, which is slower but handles larger volumes.
Parallelism determines how many Flink subtasks process the windowed aggregation concurrently. Higher parallelism increases throughput but also increases the total state that must be checkpointed. Set parallelism based on your input data rate and desired latency.
Late data handling depends on your watermark strategy. Events that arrive after the watermark has passed a window’s end boundary are dropped by default. If late data is common in your pipeline, consider setting a more generous watermark delay or using Flink’s allowed lateness configuration to accept late records into already-fired windows.
Checkpointing frequency affects recovery time. More frequent checkpoints mean less reprocessing after a failure, but they also add overhead. A common starting point is a 1-minute checkpoint interval, adjusted based on your state size and recovery requirements.
Common Pitfalls
Even experienced engineers run into a few recurring issues when working with tumbling windows in Flink SQL.
Wrong time column type. The column passed to DESCRIPTOR() must be a declared time attribute — either an event-time column with a watermark or a processing-time column declared with PROCTIME(). Passing a plain TIMESTAMP column without a watermark definition causes a planning error.
Missing watermarks. If you define an event-time column but forget the WATERMARK FOR clause, Flink cannot track progress for the window. Windows will never fire because Flink has no way to determine that a window is complete.
State retention with long windows. A 24-hour tumbling window holds state for the entire day before it fires. If your grouping keys have high cardinality, this can consume significant memory. Monitor your state size through Flink’s metrics and consider whether a shorter window with downstream rollup logic would be more practical.
Timezone alignment. Tumbling windows are aligned to epoch (UTC) by default. A INTERVAL '1' DAY window starts at midnight UTC, not midnight in your local timezone. If you need timezone-aware daily windows, use the optional OFFSET parameter or adjust your event timestamps before windowing.
Ignoring backpressure. If your sink cannot keep up with the rate at which windows fire, backpressure will propagate upstream and delay watermark advancement. This can cause windows to accumulate more state than expected, compounding the problem.
When to Use Tumbling Windows vs Other Window Types
Choosing the right window type depends on the semantics you need. Here is a quick comparison.
| Window Type | Overlap | Parameters | Best For |
|---|---|---|---|
| Tumbling | None - events belong to exactly one window | Size | Fixed-interval reporting: per-minute counts, hourly rollups, daily summaries |
| Sliding (Hop) | Windows overlap - one event can appear in multiple windows | Size + slide | Moving averages, trend detection where you need overlapping perspectives on the data |
| Session | Dynamic - windows are defined by gaps in activity | Gap duration | User session analysis, activity-based grouping where intervals are not fixed |
| Cumulate | Expanding - windows grow incrementally within a maximum size | Step + max size | Early results within a larger window, such as showing running totals that update every minute within an hourly window |
Tumbling windows are the right default when your use case aligns with fixed, calendar-style intervals. They are the most memory-efficient window type because each event belongs to a single window and state is released as soon as the window fires. If you need overlapping perspectives on the same data, switch to sliding windows. If your intervals are defined by user behavior rather than the clock, session windows are the better fit.
For most real-time analytics, dashboards, and operational monitoring, tumbling windows provide the right balance of simplicity, efficiency, and correctness. Combined with event-time processing and a well-tuned watermark, they form the backbone of reliable streaming aggregations in Flink SQL.