How do I stream DynamoDB data to Snowflake?

Enable DynamoDB Streams on your table with NEW_AND_OLD_IMAGES view type, then use a CDC tool or managed platform to consume the stream, flatten DynamoDB items into relational rows, and load them into Snowflake. Common architectures include DynamoDB Streams → Lambda → S3 → Snowpipe, or using a managed platform like Streamkap that handles the entire pipeline.

What is the latency of DynamoDB to Snowflake sync?

DynamoDB Streams makes changes available within seconds of the write. End-to-end latency to Snowflake depends on the pipeline architecture. Lambda-based pipelines typically achieve 30-60 second latency, while managed CDC platforms can deliver data in 10-30 seconds. The Snowflake ingestion method (Snowpipe vs Snowpipe Streaming) is usually the bottleneck.

How do I handle DynamoDB's nested attributes in Snowflake?

Three approaches: (1) Store as Snowflake VARIANT and query with dot notation, (2) Flatten top-level attributes into columns at ingestion time, (3) Hybrid with key attributes as columns and full item as VARIANT. DynamoDB's type descriptors (S, N, M, L, etc.) must be unwrapped to native Snowflake types.

Does Streamkap support DynamoDB to Snowflake?

Yes. Streamkap provides a managed DynamoDB CDC connector that reads DynamoDB Streams and a Snowflake destination. The platform handles shard management, item flattening, DynamoDB type conversion, and Snowflake loading automatically, with no Lambda functions, S3 staging, or custom code.

<--- Back to all resources

Data Integration

February 25, 2026

10 min read

DynamoDB to Snowflake: Syncing NoSQL to Your Data Warehouse

Stream DynamoDB changes to Snowflake in real time using DynamoDB Streams and CDC. Learn how to handle document flattening, schema mapping, and keep analytics fresh.

TL;DR: • DynamoDB Streams captures every item-level change (INSERT, MODIFY, REMOVE) and can stream them to Snowflake for real-time analytics on NoSQL data. • The key challenge is mapping DynamoDB's schema-less, nested items to Snowflake's relational tables, which requires flattening, type conversion, and handling of the DynamoDB JSON format. • Managed CDC platforms abstract away DynamoDB Streams consumption, shard management, and Snowflake loading into a simple configuration.

DynamoDB excels as a low-latency key-value store for transactional workloads, but its query model is deliberately limited. You can look up items by partition key and sort key, run scans with filter expressions, and that is about it. The moment your team needs to join orders with customer profiles, calculate rolling revenue windows, or feed a machine learning feature store, you need that data somewhere else. Snowflake is a natural destination: it handles semi-structured data natively, scales compute independently from storage, and speaks the SQL your analysts already know.

The challenge is getting data from a schema-less NoSQL store into a columnar warehouse without losing fidelity, missing updates, or building a fragile pipeline you dread maintaining. This guide walks through the architecture, the type-system quirks, and the operational decisions you will face when streaming DynamoDB changes to Snowflake.

How DynamoDB Streams Works

DynamoDB Streams is a change data capture mechanism built into DynamoDB. When enabled on a table, every item-level write (put, update, delete) produces a stream record that is durably stored in a time-ordered sequence.

Stream View Types

You configure streams with one of four view types:

View Type	Record Contains
`KEYS_ONLY`	Partition key and sort key of the modified item
`NEW_IMAGE`	The entire item as it appears after the modification
`OLD_IMAGE`	The entire item as it appeared before the modification
`NEW_AND_OLD_IMAGES`	Both before and after images of the item

For CDC to Snowflake, always choose NEW_AND_OLD_IMAGES. This gives you the full before-and-after state needed to reconstruct updates and compute diffs downstream.

Shard Model and Retention

Stream records are organized into shards, which are similar to Kafka partitions. Each shard has a fixed capacity and lifespan. As your table’s throughput changes, DynamoDB automatically splits and merges shards. Critically, stream records are retained for only 24 hours. If your consumer falls behind by more than a day, those records are gone. This constraint makes reliable, always-on consumption essential.

Table Write → Stream Record → Shard → Consumer (must read within 24h)

Each shard provides an iterator that your consumer advances through. When a shard closes (due to a split or merge), you must discover the child shards and continue reading from them. Managing this shard lineage is one of the more tedious parts of building a DynamoDB Streams consumer from scratch.

Architecture Options

There are three common patterns for moving DynamoDB data into Snowflake.

Option 1: Streams to Lambda to S3 to Snowpipe

This is the most widely documented approach. A Lambda function is triggered by DynamoDB Streams, transforms each record into a JSON or Parquet file, writes it to S3, and Snowpipe picks it up.

DynamoDB Streams → Lambda → S3 (staging) → Snowpipe → Snowflake

Pros: Serverless, no infrastructure to manage, fine-grained control over transformation logic.

Cons: Lambda concurrency limits, cold starts, error handling complexity, S3 file fragmentation (many small files hurt Snowflake query performance), and you own every line of the transformation code.

Option 2: Streams to Kinesis Data Streams to Firehose to S3

DynamoDB can replicate its stream into a Kinesis Data Stream, which then feeds Kinesis Data Firehose for buffered delivery to S3.

DynamoDB Streams → Kinesis Data Streams → Firehose → S3 → Snowpipe → Snowflake

Pros: Firehose handles buffering and batching (reducing small-file problems), Kinesis provides longer retention than DynamoDB Streams alone.

Cons: Additional AWS services to configure and pay for, Firehose adds latency due to buffering windows (minimum 60 seconds), and you still need transformation logic for the DynamoDB JSON format.

Option 3: Managed CDC Platform

A managed CDC platform like Streamkap connects directly to DynamoDB Streams as a source and writes to Snowflake as a destination. The platform handles shard management, checkpointing, type conversion, and Snowflake loading internally.

DynamoDB Streams → Streamkap → Snowflake

Pros: No Lambda functions, no S3 staging buckets, no Snowpipe configuration, automatic schema mapping and type conversion. Cons: Adds a third-party dependency.

The DynamoDB Type System

DynamoDB does not store plain JSON. It uses a typed attribute format where every value is wrapped in a type descriptor. This is the single biggest source of friction when loading DynamoDB data into a relational system.

Here are the DynamoDB type descriptors and their Snowflake equivalents:

DynamoDB Type	Descriptor	Example	Snowflake Type
String	`S`	`{"S": "hello"}`	`VARCHAR`
Number	`N`	`{"N": "42.5"}`	`NUMBER` or `FLOAT`
Binary	`B`	`{"B": "dGhpcw=="}`	`BINARY`
Boolean	`BOOL`	`{"BOOL": true}`	`BOOLEAN`
Null	`NULL`	`{"NULL": true}`	`NULL`
List	`L`	`{"L": [{"S": "a"}, {"N": "1"}]}`	`VARIANT` (array)
Map	`M`	`{"M": {"key": {"S": "val"}}}`	`VARIANT` (object)
String Set	`SS`	`{"SS": ["a", "b"]}`	`VARIANT` (array)
Number Set	`NS`	`{"NS": ["1", "2"]}`	`VARIANT` (array)
Binary Set	`BS`	`{"BS": ["dGhpcw=="]}`	`VARIANT` (array)

A raw DynamoDB stream record for a simple order item looks like this:

{
  "orderId": {"S": "ORD-2026-001"},
  "customerId": {"S": "CUST-4821"},
  "totalAmount": {"N": "149.99"},
  "items": {"L": [
    {"M": {
      "sku": {"S": "SKU-100"},
      "qty": {"N": "2"},
      "price": {"N": "49.99"}
    }},
    {"M": {
      "sku": {"S": "SKU-205"},
      "qty": {"N": "1"},
      "price": {"N": "50.01"}
    }}
  ]},
  "isPrime": {"BOOL": true},
  "createdAt": {"S": "2026-02-25T10:30:00Z"}
}

Before this can land in Snowflake as a usable row, every type descriptor must be unwrapped. The {"S": "ORD-2026-001"} wrapper becomes just "ORD-2026-001", and {"N": "149.99"} becomes the number 149.99. Numbers in DynamoDB are always strings inside the descriptor, so you also need to cast them to numeric types.

Document Flattening Strategies

Once type descriptors are stripped, you still need to decide how a nested, schema-less document maps to Snowflake’s columnar model.

Strategy 1: Full VARIANT Storage

Store each DynamoDB item as a single VARIANT column in Snowflake and use dot-notation queries at read time.

CREATE TABLE orders (
  pk VARCHAR,
  sk VARCHAR,
  item VARIANT,
  _streamkap_ts TIMESTAMP_NTZ
);

-- Query nested data
SELECT
  item:orderId::VARCHAR AS order_id,
  item:totalAmount::NUMBER(10,2) AS total,
  item:items[0]:sku::VARCHAR AS first_sku
FROM orders;

Pros: Zero schema management, handles any item shape. Cons: No columnar pruning benefits, every query must parse the VARIANT, harder for analysts unfamiliar with semi-structured syntax.

Strategy 2: Top-Level Flattening

Extract top-level attributes into dedicated columns during ingestion.

CREATE TABLE orders (
  order_id VARCHAR,
  customer_id VARCHAR,
  total_amount NUMBER(10,2),
  is_prime BOOLEAN,
  created_at TIMESTAMP_NTZ,
  items VARIANT,   -- nested list stays as VARIANT
  _cdc_operation VARCHAR,
  _cdc_timestamp TIMESTAMP_NTZ
);

Pros: Standard SQL queries, columnar storage benefits, analysts can use the table immediately. Cons: Requires defining the schema upfront, new top-level attributes need schema evolution.

Strategy 3: Hybrid

Keep a flattened set of known columns plus a raw_item VARIANT column containing the full document. This gives you the best of both approaches: fast queries on known columns, and access to the full document when you need it.

Handling Updates and Deletes

With NEW_AND_OLD_IMAGES enabled, every stream record includes an eventName field:

INSERT: A new item was added. Only NewImage is present.
MODIFY: An existing item was updated. Both OldImage and NewImage are present.
REMOVE: An item was deleted. Only OldImage is present.

For Snowflake, you typically implement one of two patterns:

Append-only (immutable log): Write every event as a new row. Use QUALIFY ROW_NUMBER() OVER (PARTITION BY pk ORDER BY _cdc_timestamp DESC) = 1 to get the latest state. This preserves full history but increases storage.

Upsert (merge): Use Snowflake MERGE statements to apply inserts, updates, and deletes to a current-state table. This keeps the table compact and queryable with simple SELECT statements, but loses history unless you also maintain a separate audit table.

MERGE INTO orders_current AS target
USING orders_staging AS source
ON target.order_id = source.order_id
WHEN MATCHED AND source._cdc_operation = 'REMOVE' THEN DELETE
WHEN MATCHED THEN UPDATE SET
  target.customer_id = source.customer_id,
  target.total_amount = source.total_amount,
  target.is_prime = source.is_prime,
  target.updated_at = source._cdc_timestamp
WHEN NOT MATCHED AND source._cdc_operation != 'REMOVE' THEN INSERT
  (order_id, customer_id, total_amount, is_prime, updated_at)
  VALUES (source.order_id, source.customer_id, source.total_amount, source.is_prime, source._cdc_timestamp);

Schema-on-Read Challenges

DynamoDB tables are schema-less. Two items in the same table can have completely different attributes. This creates three problems when syncing to Snowflake:

Inconsistent attributes. Item A might have a shippingAddress attribute while Item B does not. Your pipeline needs to handle missing fields gracefully, either by inserting NULL or by using VARIANT columns for optional attributes.

Type polymorphism. The same attribute name might be a string in one item and a number in another. DynamoDB allows this; Snowflake does not (at least not within a single typed column). Your transformation layer must detect and handle these conflicts, typically by coercing to the broader type or falling back to VARIANT.

Attribute name evolution. Over time, application code changes may rename attributes (e.g., addr becomes address). Without a schema registry or mapping layer, your Snowflake table accumulates both old and new column names.

Practical Pipeline Example

Consider an e-commerce orders table with approximately 50,000 writes per hour. Here is what the end-to-end flow looks like with a managed CDC platform:

Source: DynamoDB table prod-orders with Streams enabled (NEW_AND_OLD_IMAGES).
CDC connector reads from DynamoDB Streams, manages shard iteration, and checkpoints progress.
Type unwrapping strips {"S": ...}, {"N": ...} wrappers and casts values to native types.
Flattening extracts order_id, customer_id, total_amount, status, and created_at into columns. The items list stays as a JSON array.
Snowflake sink writes micro-batches using Snowpipe Streaming for low latency, targeting an orders table in the analytics schema.
Post-load transforms (dbt or Snowflake tasks) build a dim_orders model that joins with dim_customers.

With Streamkap, steps 2 through 5 are handled by configuring a DynamoDB source connector and a Snowflake destination connector. The platform manages shard discovery, type conversion, flattening, and Snowflake ingestion without requiring custom code.

Self-Managed vs Managed Pipelines

Building a DynamoDB-to-Snowflake pipeline with Lambda and S3 is straightforward in concept but demanding in practice. Here is what you take on:

Concern	Self-Managed (Lambda + S3)	Managed CDC Platform
Shard management	You handle splits, merges, iterator expiry	Automated
Checkpointing	DynamoDB or custom checkpoint table	Built-in
Type unwrapping	Custom code in Lambda	Automatic
Schema evolution	Manual ALTER TABLE statements	Automatic column addition
Error handling	Dead letter queues, retry logic	Built-in with alerting
Small file problem	Compaction jobs or careful buffering	Optimized batching
Monitoring	CloudWatch dashboards you build	Pre-built metrics
Snowflake auth	Manage key pairs, rotate credentials	Configured once

For teams running one or two tables with stable schemas, the Lambda approach is manageable. For larger deployments with dozens of tables and evolving schemas, the operational overhead compounds quickly.

Monitoring Your Pipeline

Regardless of architecture, there are key metrics to watch:

DynamoDB Streams Side

IteratorAge (CloudWatch): The age of the last record read from a shard. If this number climbs toward the 24-hour retention limit, your consumer is falling behind and you risk data loss. Set an alarm at 4 hours.
ReadThrottleEvents: Indicates your consumer is reading too aggressively. Back off or request a limit increase.
Number of active shards: A sudden spike means your table scaled up, and your consumer must handle more parallel reads.

Snowflake Side

Data freshness: Query the CDC timestamp column to measure how far behind Snowflake is from the source. A simple check:

SELECT
  DATEDIFF('second', MAX(_cdc_timestamp), CURRENT_TIMESTAMP()) AS lag_seconds
FROM analytics.orders;

Row counts: Compare DynamoDB item count (from table description) with Snowflake row count. For append-only tables, Snowflake will have more rows (one per event). For upsert tables, counts should approximately match.
Failed loads: Monitor Snowpipe’s COPY_HISTORY table function or Snowpipe Streaming’s offset lag to catch ingestion failures early.

Alerting Baseline

At minimum, set alerts for: iterator age exceeding 2 hours, data freshness exceeding your SLA (commonly 5 minutes for real-time use cases), and any ingestion errors in Snowflake. Most managed platforms, including Streamkap, provide these alerts out of the box along with pipeline health dashboards.

What This Means for Your Pipeline

Streaming DynamoDB to Snowflake bridges the gap between operational NoSQL workloads and analytical SQL queries. The core technical challenges are specific to DynamoDB: its typed attribute format requires unwrapping, its schema-less nature demands a flattening strategy, and its stream retention window leaves no room for downtime in your consumer. Whether you build on Lambda and S3 or use a managed CDC platform, understanding these fundamentals ensures your pipeline delivers accurate, timely data to Snowflake without surprises.

Ready to stream DynamoDB to Snowflake without building custom Lambda functions? Streamkap handles shard management, type unwrapping, flattening, and Snowflake loading automatically. Start a free trial or learn more about the Snowflake destination.

Products

Capabilities

Streamkap for...

Use Cases

By Destination

Compare

Learn

Company