<--- Back to all resources
Flink SQL Sliding (Hop) Windows: When and How to Use Them
Master sliding windows in Flink SQL for overlapping time-based aggregations. Learn syntax, use cases, and performance tuning with real-world streaming examples.
Tumbling windows give you clean, non-overlapping time buckets. But what happens when you need results more often than the window closes — when a 10-minute aggregate updated every 10 minutes is simply too slow for your alerting pipeline or real-time dashboard? That is where sliding windows come in.
Sliding windows (also called hop windows in Flink SQL) produce overlapping time intervals. A single event can participate in multiple windows simultaneously, giving you smoothed, continuously updated aggregations that respond to changes faster than any non-overlapping approach. If you are new to Flink SQL or want a broader view of all window types, start with our complete Flink SQL guide before getting into the details here.
How Sliding Windows Work
The core idea behind a sliding window is simple: you define two parameters instead of one.
- Window size — how much historical data each window covers (e.g., 10 minutes).
- Slide interval — how frequently a new window starts (e.g., 2 minutes).
With a 10-minute size and a 2-minute slide, a new window opens every 2 minutes, and each window spans the previous 10 minutes of data. At any given moment, five windows are active simultaneously (10 / 2 = 5), and every incoming event lands in all five of them.
Contrast with Tumbling Windows
A tumbling window is really just a special case of a sliding window where the size equals the slide. Each event belongs to exactly one window, and there is no overlap. Tumbling windows are ideal when you need mutually exclusive intervals — hourly totals, daily counts, or end-of-period snapshots.
Sliding windows are the right tool when you need the opposite: frequent, overlapping results that smooth out short-term spikes and give you a rolling view of your data. Think of it as the difference between checking your bank balance once a day versus watching a moving average of your spending every hour.
Visualizing the Overlap
Consider events arriving on a timeline with a 10-minute window and a 5-minute slide:
Time: 00:00 00:05 00:10 00:15 00:20
Window 1: [==========]
Window 2: [==========]
Window 3: [==========]
Window 4: [==========]
An event at 00:07 falls inside Window 1 (00:00-00:10) and Window 2 (00:05-00:15). An event at 00:12 falls inside Window 2 and Window 3. Each window produces its own independent aggregate result when it closes, so you get a new output every 5 minutes covering the last 10 minutes of data.
Flink SQL Syntax: The HOP() Table-Valued Function
Flink SQL exposes sliding windows through the HOP() table-valued function (TVF). This is the modern, recommended approach introduced in Flink 1.13, replacing the older HOP group window function.
SELECT
window_start,
window_end,
sensor_id,
AVG(temperature) AS avg_temp,
COUNT(*) AS reading_count
FROM TABLE(
HOP(
TABLE sensor_readings,
DESCRIPTOR(event_time),
INTERVAL '2' MINUTES, -- slide interval
INTERVAL '10' MINUTES -- window size
)
)
GROUP BY window_start, window_end, sensor_id;
The HOP() function takes four arguments:
- TABLE — the source table or view.
- DESCRIPTOR — the time attribute column (event time or processing time).
- Slide interval — how often a new window opens.
- Window size — the duration each window covers.
The function adds window_start, window_end, and window_time columns to the result, which you include in the GROUP BY clause alongside any other grouping keys.
A Note on Parameter Order
A common source of confusion: the slide interval comes before the window size in HOP(). This differs from some other streaming frameworks where the size is listed first. Getting the order wrong will not produce an error — it will produce windows with an unexpected configuration.
Practical Examples
5-Minute Moving Average Updated Every Minute
This is the classic sliding window use case. You want a smoothed view of a metric that updates frequently, eliminating the noise of individual data points.
SELECT
window_start,
window_end,
device_id,
AVG(cpu_usage) AS avg_cpu,
MAX(cpu_usage) AS peak_cpu
FROM TABLE(
HOP(
TABLE server_metrics,
DESCRIPTOR(event_time),
INTERVAL '1' MINUTE,
INTERVAL '5' MINUTES
)
)
GROUP BY window_start, window_end, device_id;
Each minute, you get a fresh average of the last 5 minutes. This is far more useful for dashboards and alerting than a single raw data point, because transient spikes are smoothed into the broader trend.
Hourly Trend Detection with 10-Minute Slides
For identifying longer-term patterns - say, a gradual increase in error rates - widen both the size and the slide:
SELECT
window_start,
window_end,
service_name,
COUNT(*) FILTER (WHERE status_code >= 500) AS error_count,
COUNT(*) AS total_requests,
CAST(COUNT(*) FILTER (WHERE status_code >= 500) AS DOUBLE)
/ COUNT(*) AS error_rate
FROM TABLE(
HOP(
TABLE http_requests,
DESCRIPTOR(event_time),
INTERVAL '10' MINUTES,
INTERVAL '1' HOUR
)
)
GROUP BY window_start, window_end, service_name;
Every 10 minutes, you get the error rate over the past hour. This lets you detect a slow creep in failures that a 1-minute tumbling window would miss entirely.
Rate Limiting and Spike Detection
Sliding windows are natural for rate-based alerting. The following query detects when a user exceeds 100 API calls within any rolling 5-minute period, checking every minute:
SELECT
window_start,
window_end,
user_id,
COUNT(*) AS request_count
FROM TABLE(
HOP(
TABLE api_requests,
DESCRIPTOR(event_time),
INTERVAL '1' MINUTE,
INTERVAL '5' MINUTES
)
)
GROUP BY window_start, window_end, user_id
HAVING COUNT(*) > 100;
Because the windows overlap, a burst of activity at the boundary between two tumbling windows will not slip through undetected. Every minute, you re-evaluate the full 5-minute window.
Sliding Windows on CDC Event Streams
When your source data comes from a database via change data capture (CDC), sliding windows let you compute real-time aggregations over live database mutations. For example, tracking rolling order totals from a PostgreSQL database streamed through Streamkap:
SELECT
window_start,
window_end,
region,
SUM(order_total) AS rolling_revenue,
COUNT(DISTINCT customer_id) AS unique_customers
FROM TABLE(
HOP(
TABLE orders_cdc,
DESCRIPTOR(event_time),
INTERVAL '5' MINUTES,
INTERVAL '30' MINUTES
)
)
GROUP BY window_start, window_end, region;
Every 5 minutes, you see the rolling 30-minute revenue by region — computed directly from the stream of inserts, updates, and deletes flowing out of your transactional database. This pattern turns an operational database into a real-time analytics source without building a separate batch pipeline.
Choosing Size and Slide Parameters
The relationship between size and slide determines both the behavior and the cost of your sliding window.
The overlap ratio is window_size / slide_interval. This number tells you two things:
- How many windows each event participates in.
- The state multiplication factor — how much more memory and CPU you use compared to a tumbling window of the same size.
| Size | Slide | Overlap Ratio | Events per Window Copy |
|---|---|---|---|
| 10 min | 10 min | 1 (tumbling) | 1 |
| 10 min | 5 min | 2 | 2 |
| 10 min | 1 min | 10 | 10 |
| 1 hour | 1 min | 60 | 60 |
A ratio of 2 is generally safe and doubles your state footprint. A ratio of 10 starts to be expensive. A ratio of 60 should trigger a serious conversation about whether you genuinely need minute-by-minute updates over an hourly window, or whether a 5-minute slide (ratio of 12) would serve equally well.
Rule of thumb: start with the coarsest slide interval that meets your latency requirements, then tighten only if the business case justifies the additional resource consumption.
Performance and State Management
Sliding windows are more expensive than tumbling windows because of the overlap. Each event is duplicated across multiple window buckets in Flink’s state backend, and each window must be independently aggregated and emitted.
State Size Estimation
For a rough estimate, multiply your tumbling window state by the overlap ratio. If a tumbling 10-minute window over your data produces 50 MB of state, switching to a 1-minute slide (overlap ratio of 10) will push that toward 500 MB.
This state lives either in-memory (with the HashMapStateBackend) or on local disk (with the RocksDB state backend). For high-overlap configurations, RocksDB is strongly recommended because it can spill to disk rather than exhausting JVM heap memory.
Checkpoint Considerations
Larger state means longer checkpoint durations. If your checkpoint interval is 60 seconds and your state grows large enough that snapshotting takes 45 seconds, you are dangerously close to a checkpoint timeout. Monitor checkpoint duration and alignment time, and scale up parallelism or increase checkpoint intervals as needed.
On a managed Flink platform like Streamkap, checkpoint tuning, state backend configuration, and resource scaling are handled automatically, so you can focus on query logic instead of operational overhead.
Event Time and Watermarks
Sliding windows operate on a time attribute — typically event time, which is the timestamp embedded in the record itself. Flink uses watermarks to track the progress of event time and to decide when a window can be finalized.
How Watermarks Interact with Sliding Windows
Flink generates a watermark that says, in effect, “I believe all events with a timestamp earlier than W have arrived.” When a watermark passes the end of a window, that window fires and emits its result.
With sliding windows, multiple windows close at different times, and a single watermark advance can trigger several window emissions simultaneously. This is normal and expected behavior.
Handling Late Data
Events that arrive after the watermark has passed their window’s end time are considered late. By default, Flink drops them. You can configure an allowed lateness period to accept late arrivals and re-emit corrected window results:
-- In Table API / DataStream configuration
-- Flink SQL allows late data handling via source-level watermark strategies
CREATE TABLE sensor_readings (
sensor_id STRING,
temperature DOUBLE,
event_time TIMESTAMP(3),
WATERMARK FOR event_time AS event_time - INTERVAL '10' SECONDS
) WITH ( ... );
The watermark definition event_time - INTERVAL '10' SECONDS tells Flink to tolerate up to 10 seconds of out-of-order data. Events arriving more than 10 seconds late will be dropped. For sliding windows, choosing an appropriate watermark delay is critical because a late event that misses one window likely misses several overlapping windows as well.
Common Pitfalls
Excessive Overlap Ratios
The most frequent mistake is choosing a slide interval that is far too small relative to the window size. A 24-hour window with a 1-second slide creates 86,400 simultaneous windows per key. This will consume enormous amounts of state and almost certainly cause out-of-memory failures or checkpoint timeouts. Always calculate the overlap ratio before deploying.
Confusing Slide and Size Parameter Order
As noted earlier, HOP() expects the slide interval before the window size. Swapping them produces a valid but incorrect configuration that can be difficult to debug in production.
Ignoring Output Volume
Each sliding window emits a separate result row. With a 1-minute slide and 100,000 distinct keys, you produce 100,000 rows per minute. Make sure your downstream sink — whether a Kafka topic, a database, or a dashboard — can handle this write volume without becoming a bottleneck.
Using Processing Time When Event Time Is Available
Processing time is simpler to set up, but it makes your results non-deterministic. If your events carry a reliable timestamp, prefer event time. This is especially important for sliding windows because overlapping intervals magnify any timing inconsistencies introduced by processing-time semantics.
Sliding Windows vs. Cumulate Windows
Flink SQL also offers the CUMULATE() table-valued function, which is sometimes confused with sliding windows. The difference is important.
- HOP (Sliding): Every window has the same fixed size. Windows start at regular intervals determined by the slide. All windows are independent and overlap equally.
- CUMULATE: Windows share a common start point and grow incrementally until they reach a maximum size, then reset. Early windows are smaller than later ones within the same cycle.
For example, CUMULATE(TABLE t, DESCRIPTOR(ts), INTERVAL '5' MINUTES, INTERVAL '1' HOUR) produces windows of 5 minutes, 10 minutes, 15 minutes, …, up to 1 hour, all starting from the same aligned point. Then the cycle resets.
When to use cumulate instead of hop: Cumulate windows are ideal when you need progressive partial results within a fixed reporting period — such as showing “revenue so far this hour” updated every 5 minutes. Sliding windows are better when you need a true rolling aggregation with a consistent window size across all results.
| Feature | HOP (Sliding) | CUMULATE |
|---|---|---|
| Window size | Fixed | Growing within cycle |
| Overlap pattern | Uniform | Progressive |
| Use case | Moving averages, rolling alerts | Partial period reports |
| State cost | size / slide per event | max_size / step per event |
Choose based on whether your business logic needs a constant lookback window (sliding) or incremental progress toward a period boundary (cumulate).
Wrapping Up
Sliding windows are one of the most powerful tools in the Flink SQL toolkit. They give you the ability to compute continuously updated, overlapping aggregations — moving averages, rolling counts, trend detectors, and rate limiters — that are impossible with non-overlapping tumbling windows alone.
The key decisions come down to two numbers: window size and slide interval. Get the ratio right, and you get responsive, resource-efficient streaming analytics. Get it wrong, and you get state explosion and checkpoint failures. Start conservative, measure your state size and checkpoint duration, and tighten the slide only when the business demands it.