APACHE ICEBERG

Real-Time CDC to Open Lakehouse

Stream database changes directly to Apache Iceberg tables. Open format with zero vendor lock-in. Query from Spark, Trino, Athena, Databricks, or any engine. ACID transactions and time travel included.

Open Table Format, Zero Lock-in

Write once, query from anywhere with full transactional guarantees

Open Table Format

No vendor lock-in. Query Iceberg tables from Spark, Trino, Presto, Dremio, Athena, and more.

ACID Transactions

Full transactional guarantees for concurrent reads and writes. No corrupt data.

Time Travel

Query historical snapshots. Roll back to any point in time for debugging or compliance.

Schema Evolution

Add, rename, or drop columns without rewriting data. Changes flow automatically.

Hidden Partitioning

Automatic partition pruning without exposing partitions in queries.

Any Cloud Storage

Write to S3, GCS, Azure Blob, MinIO, or any S3-compatible storage.

Supported Catalogs

Connect to your existing metadata catalog

AWS Glue

Native Glue Data Catalog integration

Hive Metastore

Self-hosted Hive Metastore

REST Catalog

Tabular, Nessie, or custom REST

Snowflake

Polaris Catalog integration

What You Can Build

Build open data lakehouses without vendor lock-in
Enable multi-engine analytics (Spark + Trino + Athena)
Historical data analysis with time travel queries
Cost-effective analytics on cloud object storage
ML training data with point-in-time consistency

How It Works

Streamkap captures changes from your source databases using CDC and streams them directly to Apache Iceberg tables on your cloud object storage. Query with any engine.

  • 1. Capture: CDC captures changes from PostgreSQL, MySQL, MongoDB, and more
  • 2. Transform: Apply SQL transforms, filtering, and masking in-flight
  • 3. Write: Data written to Iceberg tables with ACID guarantees
  • 4. Query: Use Spark, Trino, Athena, or any Iceberg-compatible engine
Source DB PostgreSQL, MySQL, etc.
Streamkap CDC + Transform
Iceberg S3 / GCS / Azure

Query with Any Engine

Iceberg's open format works with every major query engine

Apache Spark
Trino
Presto
Amazon Athena
Dremio
Snowflake
Databricks
StarRocks

Open lakehouse with real-time CDC. Zero lock-in.

Query from any engine. Write to any cloud storage. Set up in minutes.