<--- Back to all resources

Tutorials & How-To

May 22, 2025

15 min read

Load Testing Streaming Pipelines: A Practical Guide

How to load test CDC and streaming pipelines before production — tools, techniques, metrics, and a step-by-step approach to finding breaking points.

Every streaming pipeline works fine at low volume. The question is what happens when your database has a traffic spike, a batch job generates 10 million events in an hour, or your CDC source catches up after a 6-hour outage and replays the backlog. If you have not load tested, you are going to find out in production.

This guide covers how to systematically load test streaming and CDC pipelines: generating realistic load, measuring the right metrics, finding the actual breaking point, and using the results to size your infrastructure.

Why Load Test Before Production

Three scenarios consistently catch teams off guard:

1. Catch-up after downtime. Your CDC pipeline goes down for maintenance. When it comes back, it has to replay all changes that accumulated while it was offline. That replay throughput is orders of magnitude higher than steady-state — and it often hits limits the team has never tested.

2. Upstream traffic spikes. A marketing campaign, a data migration, or end-of-month batch processing drives source database writes to 10-50x normal levels. Your pipeline needs to absorb this without falling behind.

3. Schema-triggered re-snapshots. Some schema changes require a full re-snapshot of one or more tables. A re-snapshot of a 500M-row table generates a burst of events that dwarfs normal CDC throughput. For more on this, see CDC schema evolution at zero downtime.

Load testing answers the question: “At what throughput does this pipeline stop keeping up?” Once you know that number, you can either provision headroom or set up autoscaling to handle it.

Step 1: Define Your Test Scenarios

Before writing a single script, define what you are testing:

ScenarioDescriptionKey Metric
Steady stateNormal production throughputp99 latency, resource utilization
Peak load5-10x normal throughputLatency stability, consumer lag
BurstShort spike of 50-100x normalRecovery time to zero lag
Sustained high2-3x normal for 4+ hoursMemory leaks, disk accumulation
Backlog replayReplaying hours of accumulated changesMax replay throughput, destination rate limits

Not every scenario matters for every pipeline. A pipeline feeding a real-time dashboard needs to pass the burst test. A pipeline feeding a nightly analytics refresh might only need steady-state testing. Pick the scenarios that match your actual failure modes.

Step 2: Generate Load at the Source

The most accurate test generates load at the source database and lets it flow through the entire pipeline.

PostgreSQL with pgbench

# Initialize pgbench tables (scale factor 100 = ~1.5 GB of data)
pgbench -i -s 100 -h localhost -U postgres testdb

# Run load: 10 clients, 5 threads, for 300 seconds
pgbench -c 10 -j 5 -T 300 -h localhost -U postgres testdb

# Higher throughput: more clients, prepared transactions
pgbench -c 50 -j 10 -T 600 -P 10 -h localhost -U postgres testdb

The -P 10 flag prints progress every 10 seconds, showing you the TPS (transactions per second) the database is handling. Each transaction generates INSERT, UPDATE, and SELECT operations that the CDC pipeline picks up.

MySQL with sysbench

# Prepare test tables
sysbench oltp_read_write \
  --mysql-host=localhost \
  --mysql-user=root \
  --mysql-db=testdb \
  --tables=4 \
  --table-size=1000000 \
  prepare

# Run load: 16 threads, 300 seconds
sysbench oltp_read_write \
  --mysql-host=localhost \
  --mysql-user=root \
  --mysql-db=testdb \
  --tables=4 \
  --table-size=1000000 \
  --threads=16 \
  --time=300 \
  --report-interval=10 \
  run

Custom Load Generator

When you need events that match your actual schema and data distribution, write a custom generator. Here is a Python example that generates realistic order events:

import psycopg2
import random
import time
from concurrent.futures import ThreadPoolExecutor

def generate_orders(conn_string, batch_size=100, duration_seconds=300):
    conn = psycopg2.connect(conn_string)
    conn.autocommit = True
    cur = conn.cursor()

    products = list(range(1, 5001))
    customers = list(range(1, 100001))
    statuses = ['pending', 'processing', 'shipped', 'delivered']

    start = time.time()
    count = 0

    while time.time() - start < duration_seconds:
        # Mix of operations: 60% INSERT, 25% UPDATE, 15% DELETE
        op = random.choices(['insert', 'update', 'delete'],
                           weights=[60, 25, 15])[0]

        if op == 'insert':
            cur.execute("""
                INSERT INTO orders (customer_id, product_id, quantity,
                                    total, status, created_at)
                VALUES (%s, %s, %s, %s, %s, NOW())
            """, (
                random.choice(customers),
                random.choice(products),
                random.randint(1, 10),
                round(random.uniform(10, 5000), 2),
                'pending'
            ))
        elif op == 'update':
            cur.execute("""
                UPDATE orders
                SET status = %s, updated_at = NOW()
                WHERE id = (
                    SELECT id FROM orders
                    WHERE status != 'delivered'
                    ORDER BY RANDOM() LIMIT 1
                )
            """, (random.choice(statuses),))
        else:
            cur.execute("""
                DELETE FROM orders
                WHERE id = (
                    SELECT id FROM orders
                    WHERE status = 'delivered'
                      AND created_at < NOW() - INTERVAL '7 days'
                    ORDER BY RANDOM() LIMIT 1
                )
            """)

        count += 1
        if count % 1000 == 0:
            elapsed = time.time() - start
            print(f"  {count} operations in {elapsed:.1f}s "
                  f"({count/elapsed:.0f} ops/sec)")

    cur.close()
    conn.close()
    return count

# Run with multiple threads for higher throughput
with ThreadPoolExecutor(max_workers=8) as pool:
    futures = [
        pool.submit(generate_orders,
                    "postgresql://user:pass@localhost/testdb",
                    duration_seconds=300)
        for _ in range(8)
    ]
    total = sum(f.result() for f in futures)
    print(f"Total: {total} operations")

Step 3: Generate Load at the Kafka Level

Sometimes you want to test the pipeline from Kafka onward — skipping the source database and CDC connector to isolate the processing and sink stages.

kafka-producer-perf-test

This ships with Kafka and generates high-throughput messages:

# Generate 10 million messages, 1 KB each, targeting 50k msgs/sec
kafka-producer-perf-test \
  --topic cdc-events \
  --num-records 10000000 \
  --record-size 1024 \
  --throughput 50000 \
  --producer-props \
    bootstrap.servers=localhost:9092 \
    acks=all \
    linger.ms=5 \
    batch.size=65536

# Output:
# 10000000 records sent, 49987.5 records/sec (48.82 MB/sec),
# 12.3 ms avg latency, 245.0 ms max latency,
# 8 ms 50th, 23 ms 95th, 89 ms 99th, 198 ms 99.9th.

The --throughput flag is a rate limiter. Set it to -1 for maximum throughput to find the producer’s ceiling, or set it to a specific value to simulate a controlled load level.

Producing Realistic CDC Payloads

kafka-producer-perf-test generates random bytes. For testing CDC consumers that parse structured data, you need messages that match your schema:

# Using kcat (formerly kafkacat) with a JSON payload file
# Generate 100k messages from a template
for i in $(seq 1 100000); do
  echo "{\"op\":\"c\",\"ts_ms\":$(date +%s%3N),\"source\":{\"table\":\"orders\"},\"after\":{\"id\":$i,\"customer_id\":$((RANDOM % 100000)),\"total\":$((RANDOM % 5000)).$((RANDOM % 100)),\"status\":\"pending\"}}"
done | kcat -P -b localhost:9092 -t cdc.myapp.orders

For sustained load, use a compiled language. A Go producer sending structured CDC events can push 200k+ messages/second from a single machine.

Step 4: Measure the Right Metrics

During your load test, track these metrics at each pipeline stage:

End-to-End Latency

This is the most important metric: how long between a row changing in the source database and that change appearing in the destination.

# Embed a timestamp in the source row
# INSERT INTO canary (payload, source_ts) VALUES ('test', NOW());

# Read the destination and compute the delta
import time
import psycopg2  # or your destination client

def measure_e2e_latency(source_conn, dest_conn, iterations=100):
    latencies = []
    src = source_conn.cursor()
    dst = dest_conn.cursor()

    for i in range(iterations):
        # Write to source with current timestamp
        write_time = time.time()
        src.execute(
            "INSERT INTO canary (seq, source_ts) VALUES (%s, NOW())",
            (i,)
        )
        source_conn.commit()

        # Poll destination until the row appears
        while True:
            dst.execute(
                "SELECT 1 FROM canary WHERE seq = %s", (i,)
            )
            if dst.fetchone():
                arrive_time = time.time()
                latencies.append(arrive_time - write_time)
                break
            time.sleep(0.01)  # 10ms poll interval

    latencies.sort()
    n = len(latencies)
    print(f"p50: {latencies[n//2]*1000:.0f}ms")
    print(f"p95: {latencies[int(n*0.95)]*1000:.0f}ms")
    print(f"p99: {latencies[int(n*0.99)]*1000:.0f}ms")
    print(f"max: {latencies[-1]*1000:.0f}ms")

Kafka Consumer Lag

# Check lag for your consumer group
kafka-consumer-groups.sh \
  --bootstrap-server localhost:9092 \
  --describe \
  --group my-cdc-sink-group

During a load test, run this in a loop and chart the lag over time:

# Sample lag every 5 seconds
while true; do
  echo "$(date +%H:%M:%S) $(
    kafka-consumer-groups.sh \
      --bootstrap-server localhost:9092 \
      --describe \
      --group my-cdc-sink-group 2>/dev/null \
      | awk 'NR>1 {sum += $6} END {print sum}'
  )"
  sleep 5
done

If lag is flat, you are below the pipeline’s capacity. If lag is growing linearly, you have hit the breaking point. For detailed triage of lag issues, see Kafka consumer lag debugging.

Resource Utilization

Monitor CPU, memory, disk I/O, and network on every component:

  • Source database: Is the CDC reader adding noticeable load? Check pg_stat_replication or SHOW PROCESSLIST.
  • Kafka brokers: Disk write throughput, network in/out, under-replicated partitions.
  • Stream processing: CPU and memory of your processing layer.
  • Destination: Write latency, connection pool usage, rate limit responses.

Step 5: Find the Breaking Point

The systematic approach:

  1. Baseline: Run at your current production throughput for 30 minutes. Record all metrics. Everything should be stable.

  2. Ramp up: Increase throughput by 2x. Run for 30 minutes. Check if latency is still stable and lag is not growing.

  3. Keep doubling: 4x, 8x, 16x. At each level, run long enough to confirm stability (at least 15 minutes).

  4. Identify the ceiling: The breaking point is the throughput level where latency starts growing unboundedly or where you hit hard errors (OOM, connection timeouts, destination rate limits).

  5. Test sustained load at 60-70% of ceiling: This is your safe operating capacity. Run for 2-4 hours to catch slow memory leaks or disk accumulation.

What the Breaking Point Looks Like

Throughput    | p99 Latency | Lag Trend  | Status
-------------|-------------|------------|--------
10k events/s |     120ms   | Flat       | OK
20k events/s |     180ms   | Flat       | OK
40k events/s |     350ms   | Flat       | OK
80k events/s |     900ms   | Growing    | BREAKING POINT
60k events/s |     500ms   | Flat       | Safe operating max

At 80k events/s, latency is not just higher — it keeps increasing over time. This means the pipeline is processing slower than events arrive, and the backlog grows indefinitely.

Step 6: Diagnose Bottlenecks

When you find the breaking point, identify which component is the bottleneck:

SymptomLikely BottleneckAction
Source DB CPU > 80%CDC reading adds too much loadThrottle CDC reads, use a read replica
Kafka broker disk write saturatedKafka can’t persist fast enoughAdd brokers, use faster disks
Consumer CPU at 100%Processing logic is too slowOptimize transforms, add parallelism
Destination write latency spikingSink can’t absorb the throughputBatch writes, increase connection pool, check destination limits
Network throughput cappedBandwidth between componentsCompress, use larger batch sizes, co-locate
GC pauses in consumerJVM heap too small or too largeTune heap size, check for memory leaks

For a deeper look at infrastructure sizing, stream processing architecture covers the patterns for building pipelines that scale.

Step 7: Test Recovery Scenarios

Beyond raw throughput, test these recovery scenarios:

Consumer Restart Recovery

  1. During a load test, kill the consumer.
  2. Let events accumulate for 15-30 minutes.
  3. Restart the consumer.
  4. Measure how long it takes to drain the backlog and return to zero lag.

Destination Outage Recovery

  1. During a load test, make the destination unavailable (firewall, stop the process).
  2. After 30 minutes, restore the destination.
  3. Verify the pipeline recovers — no data loss, lag drains to zero, no duplicate writes (or duplicates are handled by idempotency).

Source Schema Change Under Load

  1. During a load test, run an ALTER TABLE ADD COLUMN on a replicated table.
  2. Verify the pipeline handles the schema change without data loss.
  3. Check the destination for the new column.

Putting It Together: A Load Test Checklist

  1. Define throughput targets based on peak production + headroom
  2. Set up monitoring for all five key metrics (latency, throughput, lag, errors, resources)
  3. Generate load at the source database using pgbench/sysbench or a custom generator
  4. Run steady-state test for 30+ minutes
  5. Ramp up in 2x increments until you find the breaking point
  6. Sustain at 60-70% of ceiling for 2+ hours
  7. Test recovery: consumer restart, destination outage, schema change
  8. Document the results: ceiling throughput, safe operating throughput, and the bottleneck

Using Managed Platforms for Load Testing

If you are evaluating a managed streaming platform like Streamkap, you can run the same tests: generate load at the source, measure end-to-end latency at the destination, and ramp up throughput until you find the limit. The difference is that the platform handles the scaling — you observe whether the platform meets your SLO rather than tuning infrastructure yourself.

This is a genuine advantage during load testing, because it isolates the question from “can I tune Kafka + my consumer + my sink connector to handle this load?” to “does the managed platform handle this load within my latency budget?”


Ready to skip the infrastructure tuning and focus on your data? Streamkap manages the entire streaming pipeline from source to destination, automatically scaling to handle traffic spikes and backlog replay. Start a free trial or learn more about the platform.