Why should I load test a streaming pipeline?

Streaming pipelines behave differently under high throughput than they do during development. A pipeline processing 100 events/second in dev might hit memory limits, consumer lag spikes, or destination write bottlenecks at 50,000 events/second. Load testing reveals these breaking points before your users do. It also helps you right-size infrastructure — knowing your actual throughput ceiling means you can provision accurately instead of guessing.

How do I generate realistic CDC events for testing?

The most reliable approach is to generate load directly against your source database. Use tools like pgbench (PostgreSQL) or sysbench (MySQL) to produce real INSERT/UPDATE/DELETE operations that flow through the actual CDC pipeline. For Kafka-level testing, kafka-producer-perf-test generates high-volume messages. For more realistic payloads, write a custom producer that generates events matching your schema with realistic distributions of operations and field values.

What metrics should I track during a load test?

Focus on five metrics: end-to-end latency (time from source change to destination write), throughput (events/second at each stage), consumer lag (messages waiting to be processed), error rate (failed writes, deserialization errors, timeouts), and resource utilization (CPU, memory, disk I/O on each component). Track these at the source, any intermediate processing, and the destination separately to isolate bottlenecks.

What is a good target for streaming pipeline latency?

It depends entirely on your use case. Real-time fraud detection needs sub-second end-to-end latency. Warehouse loading for analytics dashboards might tolerate 30-60 seconds. Operational reporting typically targets under 5 seconds. Define your latency SLO based on business requirements before load testing, then verify that the pipeline meets it at peak throughput.

How do I know when I have found the breaking point?

The breaking point is where latency becomes unbounded — instead of being stable at some value, it keeps growing over time. Below the breaking point, latency is flat regardless of how long the test runs. At the breaking point, consumer lag starts increasing monotonically, meaning the pipeline processes events slower than they arrive. You can also hit hard failures: out-of-memory errors, connection pool exhaustion, or destination rate limits.

<--- Back to all resources

Tutorials & How-To

May 22, 2025

15 min read

Load Testing Streaming Pipelines: A Practical Guide

How to load test CDC and streaming pipelines before production — tools, techniques, metrics, and a step-by-step approach to finding breaking points.

Every streaming pipeline works fine at low volume. The question is what happens when your database has a traffic spike, a batch job generates 10 million events in an hour, or your CDC source catches up after a 6-hour outage and replays the backlog. If you have not load tested, you are going to find out in production.

This guide covers how to systematically load test streaming and CDC pipelines: generating realistic load, measuring the right metrics, finding the actual breaking point, and using the results to size your infrastructure.

Why Load Test Before Production

Three scenarios consistently catch teams off guard:

1. Catch-up after downtime. Your CDC pipeline goes down for maintenance. When it comes back, it has to replay all changes that accumulated while it was offline. That replay throughput is orders of magnitude higher than steady-state — and it often hits limits the team has never tested.

2. Upstream traffic spikes. A marketing campaign, a data migration, or end-of-month batch processing drives source database writes to 10-50x normal levels. Your pipeline needs to absorb this without falling behind.

3. Schema-triggered re-snapshots. Some schema changes require a full re-snapshot of one or more tables. A re-snapshot of a 500M-row table generates a burst of events that dwarfs normal CDC throughput. For more on this, see CDC schema evolution at zero downtime.

Load testing answers the question: “At what throughput does this pipeline stop keeping up?” Once you know that number, you can either provision headroom or set up autoscaling to handle it.

Step 1: Define Your Test Scenarios

Before writing a single script, define what you are testing:

Scenario	Description	Key Metric
Steady state	Normal production throughput	p99 latency, resource utilization
Peak load	5-10x normal throughput	Latency stability, consumer lag
Burst	Short spike of 50-100x normal	Recovery time to zero lag
Sustained high	2-3x normal for 4+ hours	Memory leaks, disk accumulation
Backlog replay	Replaying hours of accumulated changes	Max replay throughput, destination rate limits

Not every scenario matters for every pipeline. A pipeline feeding a real-time dashboard needs to pass the burst test. A pipeline feeding a nightly analytics refresh might only need steady-state testing. Pick the scenarios that match your actual failure modes.

Step 2: Generate Load at the Source

The most accurate test generates load at the source database and lets it flow through the entire pipeline.

PostgreSQL with pgbench

# Initialize pgbench tables (scale factor 100 = ~1.5 GB of data)
pgbench -i -s 100 -h localhost -U postgres testdb

# Run load: 10 clients, 5 threads, for 300 seconds
pgbench -c 10 -j 5 -T 300 -h localhost -U postgres testdb

# Higher throughput: more clients, prepared transactions
pgbench -c 50 -j 10 -T 600 -P 10 -h localhost -U postgres testdb

The -P 10 flag prints progress every 10 seconds, showing you the TPS (transactions per second) the database is handling. Each transaction generates INSERT, UPDATE, and SELECT operations that the CDC pipeline picks up.

MySQL with sysbench

# Prepare test tables
sysbench oltp_read_write \
  --mysql-host=localhost \
  --mysql-user=root \
  --mysql-db=testdb \
  --tables=4 \
  --table-size=1000000 \
  prepare

# Run load: 16 threads, 300 seconds
sysbench oltp_read_write \
  --mysql-host=localhost \
  --mysql-user=root \
  --mysql-db=testdb \
  --tables=4 \
  --table-size=1000000 \
  --threads=16 \
  --time=300 \
  --report-interval=10 \
  run

Custom Load Generator

When you need events that match your actual schema and data distribution, write a custom generator. Here is a Python example that generates realistic order events:

import psycopg2
import random
import time
from concurrent.futures import ThreadPoolExecutor

def generate_orders(conn_string, batch_size=100, duration_seconds=300):
    conn = psycopg2.connect(conn_string)
    conn.autocommit = True
    cur = conn.cursor()

    products = list(range(1, 5001))
    customers = list(range(1, 100001))
    statuses = ['pending', 'processing', 'shipped', 'delivered']

    start = time.time()
    count = 0

    while time.time() - start < duration_seconds:
        # Mix of operations: 60% INSERT, 25% UPDATE, 15% DELETE
        op = random.choices(['insert', 'update', 'delete'],
                           weights=[60, 25, 15])[0]

        if op == 'insert':
            cur.execute("""
                INSERT INTO orders (customer_id, product_id, quantity,
                                    total, status, created_at)
                VALUES (%s, %s, %s, %s, %s, NOW())
            """, (
                random.choice(customers),
                random.choice(products),
                random.randint(1, 10),
                round(random.uniform(10, 5000), 2),
                'pending'
            ))
        elif op == 'update':
            cur.execute("""
                UPDATE orders
                SET status = %s, updated_at = NOW()
                WHERE id = (
                    SELECT id FROM orders
                    WHERE status != 'delivered'
                    ORDER BY RANDOM() LIMIT 1
                )
            """, (random.choice(statuses),))
        else:
            cur.execute("""
                DELETE FROM orders
                WHERE id = (
                    SELECT id FROM orders
                    WHERE status = 'delivered'
                      AND created_at < NOW() - INTERVAL '7 days'
                    ORDER BY RANDOM() LIMIT 1
                )
            """)

        count += 1
        if count % 1000 == 0:
            elapsed = time.time() - start
            print(f"  {count} operations in {elapsed:.1f}s "
                  f"({count/elapsed:.0f} ops/sec)")

    cur.close()
    conn.close()
    return count

# Run with multiple threads for higher throughput
with ThreadPoolExecutor(max_workers=8) as pool:
    futures = [
        pool.submit(generate_orders,
                    "postgresql://user:pass@localhost/testdb",
                    duration_seconds=300)
        for _ in range(8)
    ]
    total = sum(f.result() for f in futures)
    print(f"Total: {total} operations")

Step 3: Generate Load at the Kafka Level

Sometimes you want to test the pipeline from Kafka onward — skipping the source database and CDC connector to isolate the processing and sink stages.

kafka-producer-perf-test

This ships with Kafka and generates high-throughput messages:

# Generate 10 million messages, 1 KB each, targeting 50k msgs/sec
kafka-producer-perf-test \
  --topic cdc-events \
  --num-records 10000000 \
  --record-size 1024 \
  --throughput 50000 \
  --producer-props \
    bootstrap.servers=localhost:9092 \
    acks=all \
    linger.ms=5 \
    batch.size=65536

# Output:
# 10000000 records sent, 49987.5 records/sec (48.82 MB/sec),
# 12.3 ms avg latency, 245.0 ms max latency,
# 8 ms 50th, 23 ms 95th, 89 ms 99th, 198 ms 99.9th.

The --throughput flag is a rate limiter. Set it to -1 for maximum throughput to find the producer’s ceiling, or set it to a specific value to simulate a controlled load level.

Producing Realistic CDC Payloads

kafka-producer-perf-test generates random bytes. For testing CDC consumers that parse structured data, you need messages that match your schema:

# Using kcat (formerly kafkacat) with a JSON payload file
# Generate 100k messages from a template
for i in $(seq 1 100000); do
  echo "{\"op\":\"c\",\"ts_ms\":$(date +%s%3N),\"source\":{\"table\":\"orders\"},\"after\":{\"id\":$i,\"customer_id\":$((RANDOM % 100000)),\"total\":$((RANDOM % 5000)).$((RANDOM % 100)),\"status\":\"pending\"}}"
done | kcat -P -b localhost:9092 -t cdc.myapp.orders

For sustained load, use a compiled language. A Go producer sending structured CDC events can push 200k+ messages/second from a single machine.

Step 4: Measure the Right Metrics

During your load test, track these metrics at each pipeline stage:

End-to-End Latency

This is the most important metric: how long between a row changing in the source database and that change appearing in the destination.

# Embed a timestamp in the source row
# INSERT INTO canary (payload, source_ts) VALUES ('test', NOW());

# Read the destination and compute the delta
import time
import psycopg2  # or your destination client

def measure_e2e_latency(source_conn, dest_conn, iterations=100):
    latencies = []
    src = source_conn.cursor()
    dst = dest_conn.cursor()

    for i in range(iterations):
        # Write to source with current timestamp
        write_time = time.time()
        src.execute(
            "INSERT INTO canary (seq, source_ts) VALUES (%s, NOW())",
            (i,)
        )
        source_conn.commit()

        # Poll destination until the row appears
        while True:
            dst.execute(
                "SELECT 1 FROM canary WHERE seq = %s", (i,)
            )
            if dst.fetchone():
                arrive_time = time.time()
                latencies.append(arrive_time - write_time)
                break
            time.sleep(0.01)  # 10ms poll interval

    latencies.sort()
    n = len(latencies)
    print(f"p50: {latencies[n//2]*1000:.0f}ms")
    print(f"p95: {latencies[int(n*0.95)]*1000:.0f}ms")
    print(f"p99: {latencies[int(n*0.99)]*1000:.0f}ms")
    print(f"max: {latencies[-1]*1000:.0f}ms")

Kafka Consumer Lag

# Check lag for your consumer group
kafka-consumer-groups.sh \
  --bootstrap-server localhost:9092 \
  --describe \
  --group my-cdc-sink-group

During a load test, run this in a loop and chart the lag over time:

# Sample lag every 5 seconds
while true; do
  echo "$(date +%H:%M:%S) $(
    kafka-consumer-groups.sh \
      --bootstrap-server localhost:9092 \
      --describe \
      --group my-cdc-sink-group 2>/dev/null \
      | awk 'NR>1 {sum += $6} END {print sum}'
  )"
  sleep 5
done

If lag is flat, you are below the pipeline’s capacity. If lag is growing linearly, you have hit the breaking point. For detailed triage of lag issues, see Kafka consumer lag debugging.

Resource Utilization

Monitor CPU, memory, disk I/O, and network on every component:

Source database: Is the CDC reader adding noticeable load? Check pg_stat_replication or SHOW PROCESSLIST.
Kafka brokers: Disk write throughput, network in/out, under-replicated partitions.
Stream processing: CPU and memory of your processing layer.
Destination: Write latency, connection pool usage, rate limit responses.

Step 5: Find the Breaking Point

The systematic approach:

Baseline: Run at your current production throughput for 30 minutes. Record all metrics. Everything should be stable.
Ramp up: Increase throughput by 2x. Run for 30 minutes. Check if latency is still stable and lag is not growing.
Keep doubling: 4x, 8x, 16x. At each level, run long enough to confirm stability (at least 15 minutes).
Identify the ceiling: The breaking point is the throughput level where latency starts growing unboundedly or where you hit hard errors (OOM, connection timeouts, destination rate limits).
Test sustained load at 60-70% of ceiling: This is your safe operating capacity. Run for 2-4 hours to catch slow memory leaks or disk accumulation.

What the Breaking Point Looks Like

Throughput    | p99 Latency | Lag Trend  | Status
-------------|-------------|------------|--------
10k events/s |     120ms   | Flat       | OK
20k events/s |     180ms   | Flat       | OK
40k events/s |     350ms   | Flat       | OK
80k events/s |     900ms   | Growing    | BREAKING POINT
60k events/s |     500ms   | Flat       | Safe operating max

At 80k events/s, latency is not just higher — it keeps increasing over time. This means the pipeline is processing slower than events arrive, and the backlog grows indefinitely.

Step 6: Diagnose Bottlenecks

When you find the breaking point, identify which component is the bottleneck:

Symptom	Likely Bottleneck	Action
Source DB CPU > 80%	CDC reading adds too much load	Throttle CDC reads, use a read replica
Kafka broker disk write saturated	Kafka can’t persist fast enough	Add brokers, use faster disks
Consumer CPU at 100%	Processing logic is too slow	Optimize transforms, add parallelism
Destination write latency spiking	Sink can’t absorb the throughput	Batch writes, increase connection pool, check destination limits
Network throughput capped	Bandwidth between components	Compress, use larger batch sizes, co-locate
GC pauses in consumer	JVM heap too small or too large	Tune heap size, check for memory leaks

For a deeper look at infrastructure sizing, stream processing architecture covers the patterns for building pipelines that scale.

Step 7: Test Recovery Scenarios

Beyond raw throughput, test these recovery scenarios:

Consumer Restart Recovery

During a load test, kill the consumer.
Let events accumulate for 15-30 minutes.
Restart the consumer.
Measure how long it takes to drain the backlog and return to zero lag.

Destination Outage Recovery

During a load test, make the destination unavailable (firewall, stop the process).
After 30 minutes, restore the destination.
Verify the pipeline recovers — no data loss, lag drains to zero, no duplicate writes (or duplicates are handled by idempotency).

Source Schema Change Under Load

During a load test, run an ALTER TABLE ADD COLUMN on a replicated table.
Verify the pipeline handles the schema change without data loss.
Check the destination for the new column.

Putting It Together: A Load Test Checklist

Define throughput targets based on peak production + headroom
Set up monitoring for all five key metrics (latency, throughput, lag, errors, resources)
Generate load at the source database using pgbench/sysbench or a custom generator
Run steady-state test for 30+ minutes
Ramp up in 2x increments until you find the breaking point
Sustain at 60-70% of ceiling for 2+ hours
Test recovery: consumer restart, destination outage, schema change
Document the results: ceiling throughput, safe operating throughput, and the bottleneck

Using Managed Platforms for Load Testing

If you are evaluating a managed streaming platform like Streamkap, you can run the same tests: generate load at the source, measure end-to-end latency at the destination, and ramp up throughput until you find the limit. The difference is that the platform handles the scaling — you observe whether the platform meets your SLO rather than tuning infrastructure yourself.

This is a genuine advantage during load testing, because it isolates the question from “can I tune Kafka + my consumer + my sink connector to handle this load?” to “does the managed platform handle this load within my latency budget?”

Ready to skip the infrastructure tuning and focus on your data? Streamkap manages the entire streaming pipeline from source to destination, automatically scaling to handle traffic spikes and backlog replay. Start a free trial or learn more about the platform.

Products

Capabilities

Streamkap for...

Use Cases

By Destination

Compare

Learn

Company