Is ksqlDB being deprecated in favor of Flink SQL?

Confluent has not formally deprecated ksqlDB, but it acquired Flink company Immerok in 2023 and has positioned Confluent Cloud for Apache Flink as its strategic stream processing offering. ksqlDB development has slowed significantly, and Confluent's investment is clearly focused on Flink. For new projects, Flink SQL is the recommended choice.

Can Flink SQL read from Kafka like ksqlDB does?

Yes. Flink SQL has native Kafka connectors and can read from and write to Kafka topics with full support for Avro, JSON, Protobuf, and other formats. Unlike ksqlDB, Flink SQL can also read from databases (via CDC), file systems, and other sources, making it more versatile for multi-source pipelines.

Which has better SQL support - Flink SQL or ksqlDB?

Flink SQL has significantly broader SQL support. It includes MATCH_RECOGNIZE for complex event processing, temporal joins for point-in-time lookups, OVER windows for row-level aggregations, and full set operations (UNION, INTERSECT, EXCEPT). ksqlDB supports a subset of SQL with some non-standard extensions.

How does Streamkap fit into the Flink SQL vs ksqlDB decision?

Streamkap provides managed Apache Flink with built-in CDC connectors, offering the power of Flink SQL without the operational complexity. If you are choosing between ksqlDB and Flink SQL, Streamkap gives you Flink SQL's superior capabilities with a managed experience - no need to run your own Flink clusters or manage Kafka Connect infrastructure.

<--- Back to all resources

Engineering

February 25, 2026

12 min read

Flink SQL vs ksqlDB: Which Stream SQL Engine Should You Use?

A detailed comparison of Flink SQL and ksqlDB for stream processing. Compare architecture, SQL capabilities, state management, ecosystem, and production readiness.

TL;DR: • Flink SQL is a general-purpose stream processing SQL engine with broader SQL coverage, more join types, and support for multiple sources beyond Kafka. • ksqlDB is tightly coupled to Kafka and Confluent's ecosystem - simpler to start with but more limited in SQL capabilities and connector diversity. • Since Confluent announced Flink SQL as its strategic direction (acquiring Immerok in 2023), ksqlDB's future as a standalone product is uncertain - Flink SQL is the safer long-term bet.

Choosing between Flink SQL and ksqlDB is one of the most consequential decisions a data engineering team can make when building real-time streaming pipelines. Both engines let you write SQL against unbounded data streams, but they differ fundamentally in architecture, SQL coverage, ecosystem breadth, and - increasingly - in their long-term viability.

For years, ksqlDB was the default answer for teams already running Confluent Platform or Confluent Cloud. It offered a familiar SQL interface directly on top of Kafka Streams, making it easy to build materialized views, streaming aggregations, and simple enrichment pipelines without leaving the Kafka ecosystem. But the streaming space shifted dramatically in January 2023 when Confluent acquired Immerok, a company founded by the original creators of Apache Flink, and began positioning Flink SQL as its strategic stream processing layer.

That acquisition changed the calculus. Today, the question is not just which engine has better SQL support or lower latency - it is which engine has a future. This guide provides a direct, honest comparison to help you make that decision.

Architecture Comparison

The architectural differences between Flink SQL and ksqlDB run deep, and they explain most of the capability gaps you will encounter in practice.

Flink SQL: Distributed Dataflow Engine

Apache Flink is a full-fledged distributed dataflow engine. When you submit a Flink SQL query, the optimizer compiles it into a directed acyclic graph (DAG) of operators that execute across a cluster of TaskManagers, coordinated by a JobManager. Each operator runs in parallel across multiple task slots, and Flink manages state, checkpointing, and failure recovery at the framework level.

This architecture means Flink SQL can scale horizontally to hundreds of nodes, process millions of events per second, and recover from failures by replaying from the last consistent checkpoint - all without requiring manual intervention. Flink’s runtime is not tied to any particular messaging system; it can read from Kafka, Kinesis, file systems, databases via CDC connectors, and more.

ksqlDB: Kafka Streams Under the Hood

ksqlDB is fundamentally a SQL layer on top of Kafka Streams, which itself is a client library for building stream processing applications that read from and write to Apache Kafka. When you issue a ksqlDB query, it is translated into a Kafka Streams topology that runs as a set of consumer group instances.

This design makes ksqlDB inherently Kafka-centric. Every input must come from a Kafka topic, and every output is written to a Kafka topic. Scaling is governed by topic partition counts - you cannot have more processing instances than partitions. State is backed by RocksDB locally and replicated via changelog topics in Kafka itself.

Architectural Summary

Dimension	Flink SQL	ksqlDB
Runtime foundation	Distributed dataflow engine	Kafka Streams client library
Cluster model	JobManager + TaskManagers	Consumer group instances
Source/sink coupling	Any supported connector	Kafka topics only
Parallelism model	Configurable per operator	Bounded by partition count
Fault tolerance	Distributed snapshots (Chandy-Lamport)	Kafka changelog replication
Deployment	Standalone, YARN, Kubernetes, managed cloud	Confluent Cloud or self-hosted

SQL Capabilities

This is where the gap between Flink SQL and ksqlDB becomes most apparent. Flink SQL implements a substantial subset of ANSI SQL with streaming extensions, while ksqlDB provides a more limited, Kafka-specific SQL dialect.

Window Functions

Both engines support the three standard window types for streaming aggregations:

Window Type	Flink SQL	ksqlDB
Tumbling (fixed-size, non-overlapping)	Yes	Yes
Hopping (fixed-size, overlapping)	Yes	Yes
Session (gap-based)	Yes	Yes
Cumulate (expanding within a fixed interval)	Yes	No

Flink’s cumulate windows are particularly useful for computing running aggregates within a period - for example, a cumulative hourly count that updates every minute. ksqlDB has no equivalent.

Joins

Joins are where Flink SQL pulls decisively ahead. Both engines support the basic streaming join types, but Flink adds several that ksqlDB simply cannot express.

Join Type	Flink SQL	ksqlDB
Stream-stream (inner, left, full)	Yes	Yes
Stream-table (lookup enrichment)	Yes	Yes
Temporal join (point-in-time)	Yes	No
Interval join (time-bounded)	Yes	No
Lookup join (external system)	Yes	No
Table-table join	Yes	Yes (limited)

Temporal joins allow you to join a stream against a versioned table as of the event’s timestamp - needed for scenarios like joining transactions against the exchange rate that was valid at the time the transaction occurred. Interval joins restrict the join condition to events within a specified time range, which is both semantically correct and far more efficient for large-scale stream-stream joins.

MATCH_RECOGNIZE (Flink Only)

Flink SQL supports the MATCH_RECOGNIZE clause from the SQL:2016 standard, enabling complex event processing (CEP) directly in SQL. This lets you define patterns over rows using regular expressions and extract matched sequences.

For example, you can detect a pattern like “price dropped three consecutive times then rose” in a stock ticker stream - something that would require custom application code or a separate CEP library in ksqlDB.

OVER Windows (Flink Only)

Flink SQL supports OVER window aggregations, which compute values for each row based on a window of related rows. This includes ROW_NUMBER(), RANK(), LAG(), LEAD(), and running aggregates like SUM() OVER (ORDER BY event_time ROWS BETWEEN 10 PRECEDING AND CURRENT ROW).

ksqlDB has no support for OVER clauses, which limits its ability to express row-level analytics within a stream.

Set Operations (Flink Only)

Flink SQL supports standard set operations - UNION, UNION ALL, INTERSECT, and EXCEPT - letting you combine or compare multiple query results. ksqlDB does not support these operations, making it harder to build pipelines that merge or diff streams declaratively.

Data Sources and Sinks

One of the most practical differences between Flink SQL and ksqlDB is where your data can come from and where it can go.

ksqlDB operates exclusively within the Kafka ecosystem. Every input must be a Kafka topic, and every output is written to a Kafka topic. If your data lives in a PostgreSQL database, an S3 bucket, or an HTTP endpoint, you need Kafka Connect (or another ingestion tool) to get it into Kafka before ksqlDB can process it.

Flink SQL, by contrast, has a broad connector ecosystem that lets you read from and write to a wide range of systems directly:

Category	Flink SQL Sources/Sinks	ksqlDB Sources/Sinks
Message queues	Kafka, Kinesis, Pulsar, RabbitMQ	Kafka only
Databases	JDBC, MySQL CDC, PostgreSQL CDC, MongoDB CDC	Kafka topics only (via Connect)
File systems	S3, HDFS, local FS	Kafka topics only
Data warehouses	Hive, Iceberg, Delta Lake, Hudi	Kafka topics only
Key-value stores	HBase, Redis, Elasticsearch	Kafka topics only

This means Flink SQL can serve as a unified processing layer across your entire data infrastructure, while ksqlDB is limited to what has already been materialized into Kafka.

State Management

Both engines use RocksDB for local state storage, but they differ in how state is protected and how it scales.

Flink SQL uses a distributed snapshot algorithm (based on the Chandy-Lamport protocol) to periodically take consistent checkpoints of all operator state across the cluster. These checkpoints are stored in a durable external system (typically S3, HDFS, or GCS). On failure, Flink restores state from the latest checkpoint and replays a small amount of data from source offsets. Flink also supports incremental checkpointing for large state, which only persists the delta since the last checkpoint - important for jobs with terabytes of state.

ksqlDB relies on Kafka’s own replication for state durability. Each stateful operator maintains a RocksDB instance locally and writes state changes to a Kafka changelog topic. On failure, a new instance restores state by consuming the changelog topic from the beginning. For large state stores, this recovery can take significant time since it must replay the entire changelog - a well-known pain point in production Kafka Streams deployments.

State Feature	Flink SQL	ksqlDB
Local store	RocksDB	RocksDB
Durability mechanism	External checkpoint storage (S3, HDFS)	Kafka changelog topics
Incremental checkpoints	Yes	No (full changelog replay)
Recovery speed	Fast (restore from checkpoint + short replay)	Slower (full changelog consumption)
State size limit	Terabytes (with incremental checkpoints)	Limited by changelog replay time

Exactly-Once Semantics

Both engines support exactly-once processing semantics, but through different mechanisms.

Flink SQL achieves exactly-once through its checkpoint-based two-phase commit protocol. When writing to external sinks that support transactions (like Kafka), Flink pre-commits data as part of the checkpoint and finalizes the commit only when the checkpoint succeeds. This provides end-to-end exactly-once guarantees from source to sink.

ksqlDB inherits its exactly-once guarantees from Kafka Streams, which uses Kafka’s transactional producer APIs. This works well for Kafka-to-Kafka processing - reads, transforms, and writes are executed within a single Kafka transaction. However, exactly-once guarantees for external sinks depend on the specific Kafka Connect sink connector and its idempotency capabilities.

Ecosystem and Tooling

Development Experience

Tool	Flink SQL	ksqlDB
Interactive CLI	Flink SQL Client	ksqlDB CLI
REST API	Flink SQL Gateway	ksqlDB REST API
IDE support	IntelliJ (Flink plugin), VS Code	Limited (Confluent VS Code extension)
Catalog integration	Hive Metastore, Confluent Schema Registry, AWS Glue	Confluent Schema Registry
Testing framework	Flink SQL test harness, MiniCluster	ksqlDB test runner

Managed Services

Flink SQL is available as a managed service from multiple providers - Confluent Cloud, AWS (Amazon Managed Service for Apache Flink), Alibaba Cloud, and platforms like Streamkap that offer managed Flink alongside CDC connectors. This provider diversity gives you flexibility and reduces vendor lock-in.

ksqlDB is available only through Confluent - either as Confluent Cloud ksqlDB or as part of the self-managed Confluent Platform. There is no alternative managed provider, which creates significant vendor dependency.

The Confluent Factor

This is the elephant in the room, and it deserves direct attention.

In January 2023, Confluent acquired Immerok, a startup founded by the original creators of Apache Flink. Within months, Confluent launched “Confluent Cloud for Apache Flink” as a first-class managed service, investing heavily in its Flink SQL integration with Schema Registry, Kafka topics as Flink tables, and a unified governance layer.

The signals since then have been consistent and clear:

Engineering investment has shifted visibly toward Flink. Confluent’s blog, conference talks, and documentation increasingly position Flink SQL as the primary stream processing interface.
ksqlDB development has slowed. Commit frequency to the ksqlDB open-source repository has declined, and major new features have been scarce.
Product roadmap presentations at Kafka Summit and Current conferences have featured Flink prominently, with ksqlDB receiving diminishing attention.
Confluent’s pricing model for Flink SQL on Confluent Cloud has been made more competitive, suggesting a push to migrate users.

Confluent has not issued a formal end-of-life notice for ksqlDB. But the trajectory is unmistakable. Organizations starting new projects on ksqlDB today are building on a platform whose primary maintainer is clearly investing elsewhere. This is not speculation - it is a straightforward reading of Confluent’s public actions and statements.

For teams already running ksqlDB in production, there is no immediate crisis. Existing deployments will continue to work. But for new projects and long-term architectural decisions, Flink SQL is the safer bet.

Performance Comparison

Direct performance comparisons between Flink SQL and ksqlDB are difficult to generalize because results depend heavily on query complexity, state size, cluster configuration, and data characteristics. That said, some general observations hold:

Performance Dimension	Flink SQL	ksqlDB
Throughput ceiling	Very high (millions of events/sec per cluster)	Moderate (bounded by partition count and consumer throughput)
Latency (simple transforms)	Low milliseconds	Low milliseconds
Latency (stateful aggregations)	Low-to-moderate milliseconds	Low-to-moderate milliseconds
Scaling mechanism	Add TaskManager slots (independent of data partitioning)	Add instances (capped at partition count)
Large state performance	Strong (incremental checkpoints, async state backend)	Degrades with state size (changelog replay)
Recovery time	Seconds to low minutes (checkpoint restore)	Minutes to hours (changelog replay for large state)

For simple, stateless transformations on Kafka data, performance between the two is comparable. The differences emerge as workloads grow in complexity and scale: Flink SQL’s architecture handles large state, complex queries, and high parallelism more gracefully.

Decision Framework

Criteria	Choose Flink SQL	Choose ksqlDB
SQL complexity	Complex joins, CEP, window analytics	Simple filters, aggregations, enrichments
Data sources	Multiple systems (Kafka, databases, files)	Kafka topics exclusively
Scale requirements	High throughput, large state, many operators	Moderate throughput, manageable state
Vendor strategy	Multi-cloud, avoid lock-in	Already committed to Confluent ecosystem
Long-term investment	New project, multi-year horizon	Legacy system with existing ksqlDB jobs
Team expertise	Willing to learn Flink SQL (similar to ANSI SQL)	Familiar with ksqlDB, no time to migrate
Managed service options	Multiple providers available	Confluent only

The bottom line: For new projects in 2026, Flink SQL is the recommended choice in nearly every scenario. It offers broader SQL capabilities, more connector options, better scalability characteristics, multi-vendor managed service availability, and a clear future as Confluent’s own strategic direction. ksqlDB remains a viable option only for teams with significant existing investment and no near-term plans to expand beyond simple Kafka-centric use cases.

Migrating from ksqlDB to Flink SQL

For teams currently running ksqlDB that want to move to Flink SQL, the migration path is achievable but requires planning. Here is a practical approach:

1. Inventory Your ksqlDB Queries

Start by cataloging every persistent query running in your ksqlDB deployment. Classify each query by type: simple transforms (filters, projections), aggregations (GROUP BY with windows), joins (stream-stream, stream-table), and materialized views (CREATE TABLE AS SELECT).

2. Map ksqlDB Concepts to Flink SQL

Most ksqlDB concepts have direct Flink SQL equivalents:

ksqlDB Concept	Flink SQL Equivalent
CREATE STREAM	CREATE TABLE with Kafka connector (append mode)
CREATE TABLE	CREATE TABLE with Kafka connector (upsert mode)
CSAS (CREATE STREAM AS SELECT)	INSERT INTO … SELECT
CTAS (CREATE TABLE AS SELECT)	INSERT INTO … SELECT (with primary key)
EMIT CHANGES	Default streaming behavior
EMIT FINAL	Windowed aggregation with allowed lateness
ROWTIME	Event-time watermark configuration

3. Translate Queries Incrementally

Start with stateless transformations (filters, projections, simple maps), which translate almost one-to-one. Then move to windowed aggregations, adjusting for Flink SQL’s window syntax. Joins require the most attention - ksqlDB’s WITHIN clause maps to Flink SQL’s interval joins, and stream-table joins may benefit from Flink’s temporal join support for more precise semantics.

4. Run in Parallel

Deploy Flink SQL jobs alongside existing ksqlDB queries, writing to separate output topics. Compare results over a validation period to ensure semantic equivalence before cutting over.

5. Consider a Managed Platform

Running your own Flink cluster adds operational complexity that may offset the benefits of migration. Managed Flink services - including Streamkap’s managed Flink with built-in CDC connectors - can significantly reduce the operational burden of migration, letting your team focus on query translation rather than infrastructure provisioning.

The Bottom Line

The Flink SQL vs ksqlDB comparison is no longer a close contest. Flink SQL offers a more capable SQL dialect, a broader connector ecosystem, superior scalability, and - critically - a clear future backed by both the open-source Apache community and Confluent’s strategic investment.

ksqlDB served an important role in making stream processing accessible to SQL-literate developers. But as the streaming space matures, the limitations of its Kafka-only architecture and its uncertain product trajectory make it hard to justify for new projects.

If you are starting fresh, choose Flink SQL. If you are running ksqlDB today, plan your migration timeline thoughtfully - but do plan it.

Platform

Capabilities

Streamkap for...

Use Cases

By Destination

Compare

Learn

Company