Technology

FAQ Apache Iceberg

AUTHOR BIO
Product Marketing Manager at Streamkap

July 1, 2025

FAQ Apache Iceberg

What is Apache Iceberg, exactly?

Apache Iceberg is an open table format that makes files in your lake behave like real database tables. It adds ACID transactions, schema and partition evolution, time travel, and table-level metadata—without leaving S3/ADLS/GCS.

How does  Apache Iceberg unify lakes and warehouses?

By giving you atomic, isolated table updates for both batch and streaming. The same table can serve BI, ML, and real-time jobs without brittle glue code. (Example: Netflix uses Iceberg at petabyte scale to unify historical and streaming analytics.)

Is Apache Iceberg cloud-native?

Yes. Iceberg is designed for object storage. It uses rich metadata (manifests, partition stats) to prune scans, so engines touch only the files they need—cutting latency and cost. (Example: Expedia runs Iceberg on S3 for cost-efficient travel analytics.)

Will Apache Iceberg fit my stack?

It’s vendor-neutral and widely integrated: Spark, Flink, Trino/Presto, Kafka (ingest/CDC), plus a growing ecosystem of query engines and orchestration tools. You’re not locked into one vendor or runtime.

Can Apache Iceberg handle real-time and CDC?

Yep. Streaming upserts and incremental planning let you land CDC from Kafka/Flink and query fresh data with low latency—while keeping the canonical table in your lake.

What about flexibility as needs evolve?

Iceberg’s metadata-driven design supports hidden partitioning and painless evolution (e.g., daily → hourly) without rewriting queries. Compaction/optimization jobs keep small files in check and storage costs under control.

How does Iceberg support Time Travel?

Iceberg’s immutable snapshots enable you to query historical states for auditing or debugging purposes.

Example:

SELECT * FROM table FOR SYSTEM_TIME AS OF '2025-08-12 15:00:00';

Note: Remember to expire snapshots to manage metadata.

Can I evolve my schema with Iceberg?

Yes. You can add, drop, or change columns without rewriting any data.

Example:

ALTER TABLE table ADD COLUMN customer_segment STRING;

What is Hidden Partitioning?

This feature allows you to define partitions using metadata, so they can evolve without needing to migrate the data.

Example:

ALTER TABLE table SET PARTITION SPEC (hour(event_time));

Use Case: Optimize streaming data pipelines by switching to hourly partitions.

Does Iceberg support ACID transactions?

Yes. Iceberg ensures data integrity even with concurrent writes.

Example:

MERGE INTO table t USING source s ON t.id = s.id
WHEN MATCHED THEN UPDATE SET t.value = s.value
WHEN NOT MATCHED THEN INSERT (id, value) VALUES (s.id, s.value);

How does Iceberg handle Row-Level Operations?

With support for both Merge-on-Read (MoR) and Copy-on-Write (CoW), Iceberg handles Change Data Capture (CDC) with equality deletes.

Example:

DELETE FROM table WHERE user_id = '123';

Use Case: Perform real-time transaction updates for financial systems.

How is metadata managed in Iceberg?

The hierarchical metadata scales with various catalogs like Rest, AWS, and Hive. You can learn more at the Iceberg documentation on catalogs.

How does Iceberg improve performance?

Iceberg improves performance through data compaction, which merges small files.

Example:

CALL system.rewrite_data_files('table', options => map('target-file-size', '536870912'));

What are branching and tagging?

Iceberg supports isolated writes through branching and tagging.

Example:

CALL system.rewrite_data_files('table', options => map('target-file-size', '536870912'));

How does Iceberg handle streaming data?

Iceberg unifies batch and streaming, allowing you to build real-time data lakes.

Example:

INSERT INTO transactions
SELECT transaction_id, amount, event_time
FROM kafka_stream
WHERE event_time >= '2025-08-12';

How to connect Streamkap to Apache Iceberg?

  • Tutorial: MySQL → Iceberg – Enable CDC in MySQL, connect it in Streamkap, and stream changes directly into Iceberg tables.
  • Documentation: Explore the official guides to master every detail of your Streamkap–Iceberg pipeline.
  • Sign up: Create your Streamkap account now and start moving data in minutes.
  • Related blog posts