Why use Kafka as a buffer for webhook ingestion?

Kafka decouples receipt from processing. Your webhook endpoint can ACK immediately and write the event to a topic, then downstream consumers process at their own pace. This prevents dropped events during traffic spikes and lets you replay failed processing without asking the sender to resend.

How do I handle duplicate webhook deliveries?

Most webhook senders include an idempotency key (event ID or delivery ID) in the payload or headers. Store processed IDs in a fast lookup store like Redis with a TTL, and skip events you've already seen. This prevents duplicate processing even when senders retry.

What happens when downstream processing fails?

Use a dead letter queue (DLQ) — a separate Kafka topic where events that fail processing after a set number of retries are routed. This prevents one bad event from blocking the entire pipeline while preserving the event for later inspection and reprocessing.

Can webhooks guarantee ordering?

Not inherently. Webhook senders fire HTTP requests independently, and network conditions can reorder them. If ordering matters, include a sequence number or timestamp in the payload and use Kafka's key-based partitioning to maintain per-entity ordering within a partition.

<--- Back to all resources

Data Integration

May 22, 2025

12 min read

Webhook to Kafka: Reliable Ingestion at Scale

How to build a reliable webhook ingestion layer using Kafka as a durable buffer — covering deduplication, ordering, retries, and dead letter queues.

Webhooks are the default integration pattern for SaaS platforms. Stripe sends payment events, GitHub sends push notifications, Shopify sends order updates. The pattern is simple: the sender fires an HTTP POST to your endpoint whenever something happens.

The problem starts when you need to process these events reliably at scale. A single slow consumer, a brief outage, or a traffic spike can cause dropped events — and most webhook senders have limited retry windows. Once that window closes, the data is gone.

This guide covers how to build a webhook ingestion layer that never drops events, handles duplicates correctly, and scales independently of your processing logic.

The Core Architecture

The key insight is separating receipt from processing. Your webhook endpoint should do exactly two things:

Validate the request (auth, schema)
Write it to a durable store
Return 200 immediately

That durable store is Kafka. Everything downstream — transformation, enrichment, loading into databases — happens asynchronously by consuming from Kafka topics.

Webhook Sender → [Your Endpoint] → Kafka Topic → [Consumer A: Analytics DB]
                                                 → [Consumer B: Cache Update]
                                                 → [Consumer C: Notification Service]

This gives you several properties that a direct webhook-to-processing pipeline doesn’t:

Durability: Events survive consumer downtime
Replay: You can reprocess historical events by resetting consumer offsets
Fan-out: Multiple consumers can independently process the same events
Backpressure: Consumers process at their own rate without affecting receipt

The Webhook Receiver

Here’s a minimal webhook receiver in Python that validates, deduplicates, and produces to Kafka:

from fastapi import FastAPI, Request, HTTPException
from aiokafka import AIOKafkaProducer
import hashlib
import json
import redis.asyncio as redis

app = FastAPI()
producer: AIOKafkaProducer = None
redis_client: redis.Redis = None

DEDUP_TTL_SECONDS = 86400  # 24 hours

@app.on_event("startup")
async def startup():
    global producer, redis_client
    producer = AIOKafkaProducer(
        bootstrap_servers="kafka:9092",
        acks="all",                    # Wait for all replicas
        enable_idempotence=True,       # Exactly-once producer
        value_serializer=lambda v: json.dumps(v).encode()
    )
    await producer.start()
    redis_client = redis.Redis(host="redis", port=6379)

@app.post("/webhooks/{source}")
async def receive_webhook(source: str, request: Request):
    body = await request.body()
    payload = json.loads(body)

    # 1. Validate authentication (HMAC, API key, etc.)
    if not validate_signature(request, body, source):
        raise HTTPException(status_code=401, detail="Invalid signature")

    # 2. Extract or generate idempotency key
    idempotency_key = extract_idempotency_key(payload, request.headers)
    if idempotency_key is None:
        idempotency_key = hashlib.sha256(body).hexdigest()

    # 3. Check for duplicates
    if await redis_client.exists(f"webhook:seen:{idempotency_key}"):
        return {"status": "already_processed"}  # Still return 200

    # 4. Produce to Kafka with entity-based key for ordering
    entity_key = extract_entity_key(payload, source)
    await producer.send_and_wait(
        topic=f"webhooks.{source}",
        key=entity_key.encode() if entity_key else None,
        value={
            "source": source,
            "idempotency_key": idempotency_key,
            "received_at": datetime.utcnow().isoformat(),
            "payload": payload,
            "headers": dict(request.headers),
        }
    )

    # 5. Mark as seen (with TTL so Redis doesn't grow forever)
    await redis_client.setex(
        f"webhook:seen:{idempotency_key}",
        DEDUP_TTL_SECONDS,
        "1"
    )

    return {"status": "accepted"}

A few things to note:

The endpoint always returns 200 (or a non-retryable 4xx for auth failures). Returning 5xx tells the sender to retry, which you want to avoid if you’ve already persisted the event.
The idempotency check happens before producing to Kafka, so duplicate sends from the webhook source get caught early.
acks="all" and enable_idempotence=True on the Kafka producer ensures the event is durably written before we ACK the webhook.

Deduplication with Idempotency Keys

Duplicate webhook deliveries are not an edge case — they’re expected behavior. Senders retry on timeouts, network blips cause double-delivery, and some platforms explicitly document at-least-once delivery.

Most webhook providers include a unique event identifier:

Provider	Idempotency Key Header/Field
Stripe	`Stripe-Webhook-Id` header
GitHub	`X-GitHub-Delivery` header
Shopify	`X-Shopify-Webhook-Id` header
Twilio	`MessageSid` in payload

If the sender doesn’t provide one, generate a deterministic key by hashing the payload body. This catches exact-duplicate deliveries, though it won’t catch semantically duplicate events with different timestamps.

For more on deduplication patterns in streaming systems, see Data Deduplication in Streaming Pipelines.

Where to Store Seen Keys

Redis with a TTL is the simplest approach. The TTL should be longer than the sender’s maximum retry window (typically 24-72 hours). At scale, a Bloom filter can reduce memory usage at the cost of a small false-positive rate — acceptable if a rare duplicate getting through is tolerable.

For exactly-once guarantees all the way through, you need idempotency at the consumer level too.

Ordering: What You Get and What You Don’t

Webhooks have no inherent ordering guarantee. If a sender fires three events for the same order — order.created, order.updated, order.fulfilled — they might arrive in any sequence depending on network conditions.

Kafka gives you per-partition ordering. By choosing a partition key carefully, you can guarantee that events for the same entity are processed in order:

def extract_entity_key(payload: dict, source: str) -> str:
    """Extract entity ID for Kafka partition key.

    Events with the same key go to the same partition,
    guaranteeing ordering within that entity.
    """
    if source == "stripe":
        # All events for the same customer stay ordered
        return payload.get("data", {}).get("object", {}).get("customer", "")
    elif source == "github":
        # All events for the same repo stay ordered
        return payload.get("repository", {}).get("full_name", "")
    elif source == "shopify":
        # All events for the same order stay ordered
        return str(payload.get("id", ""))
    return ""

This means order.created and order.fulfilled for order #1234 will always be in the same partition and processed in order. But events for order #1234 and order #5678 might be in different partitions and processed in parallel.

If you need strict global ordering across all entities, you’d need a single partition — which kills parallelism. In practice, per-entity ordering is almost always sufficient.

Retry Handling

There are two sides to retries: the sender retrying delivery to your endpoint, and your consumers retrying failed processing.

Sender-Side Retries

You generally don’t control the sender’s retry behavior, but you can influence it:

Return 200 quickly (under 5 seconds). Most senders have aggressive timeouts.
Return 200 even for duplicates. A 200 tells the sender “I got it, stop retrying.”
Never return 5xx for payload validation failures. Use 4xx. A 5xx triggers retries that will never succeed for a malformed payload.

Consumer-Side Retries

When your Kafka consumer fails to process an event (database down, external API error, bug in transformation logic), you have several options:

async def process_with_retry(event: dict, max_retries: int = 3):
    """Process an event with exponential backoff.

    After max_retries, route to dead letter queue.
    """
    for attempt in range(max_retries):
        try:
            await process_event(event)
            return
        except TransientError as e:
            wait_time = min(2 ** attempt, 30)  # 1s, 2s, 4s... max 30s
            logger.warning(
                f"Transient error processing {event['idempotency_key']}, "
                f"attempt {attempt + 1}/{max_retries}, "
                f"retrying in {wait_time}s: {e}"
            )
            await asyncio.sleep(wait_time)
        except PermanentError as e:
            logger.error(f"Permanent error, routing to DLQ: {e}")
            break

    # All retries exhausted or permanent error
    await route_to_dlq(event)

The key distinction is between transient errors (network timeout, rate limit, temporary unavailability) and permanent errors (malformed data, missing required fields, schema mismatch). Only retry transient errors.

Dead Letter Queues

A dead letter queue (DLQ) is a separate Kafka topic where events that fail processing are sent. This prevents a single bad event from blocking the entire pipeline — a pattern sometimes called “poison pill” handling.

DLQ_TOPIC = "webhooks.dlq"

async def route_to_dlq(event: dict, error: str = ""):
    """Route a failed event to the dead letter queue."""
    dlq_event = {
        "original_event": event,
        "error": str(error),
        "failed_at": datetime.utcnow().isoformat(),
        "original_topic": event.get("_topic", "unknown"),
    }
    await producer.send_and_wait(
        topic=DLQ_TOPIC,
        key=event.get("idempotency_key", "").encode(),
        value=dlq_event
    )

DLQ events should include enough context to debug and reprocess later: the original event, the error message, the timestamp, and which topic/consumer failed. For a deeper look at DLQ patterns, see Dead Letter Queues in Streaming Systems.

Reprocessing from the DLQ

Build tooling to inspect and replay DLQ events. A simple approach:

A CLI tool or admin UI that lists DLQ events with filters (source, time range, error type)
A “replay” command that moves selected events back to their original topic
Alerting when the DLQ grows beyond a threshold

Scaling the Receiver

The webhook receiver itself should be stateless — all state lives in Redis (dedup) and Kafka (events). This makes horizontal scaling straightforward:

Run multiple instances behind a load balancer
Each instance has its own Kafka producer (producers are thread-safe but not process-safe)
Redis handles dedup coordination across instances automatically

For high-throughput sources (>10K events/second), consider:

Batching Kafka produces: Use linger_ms and batch_size to batch multiple webhook events into a single Kafka produce request
Connection pooling: Reuse Redis and Kafka connections across requests
Async all the way: The example above uses async Python; avoid blocking calls in the request path

Monitoring

Instrument these metrics at minimum:

Metric	Why
`webhook_received_total` (by source)	Volume tracking, anomaly detection
`webhook_duplicates_total` (by source)	Sender retry behavior, dedup effectiveness
`webhook_produce_latency_ms`	Kafka health from the producer side
`webhook_produce_errors_total`	Kafka availability issues
`dlq_events_total` (by source, error type)	Processing failure rate
`consumer_lag` (by topic, consumer group)	Processing falling behind receipt

Consumer lag is especially important. For a thorough treatment, see Understanding Kafka Consumer Lag.

When You Don’t Need to Build This

Building and maintaining a webhook ingestion layer is real operational work: the receiver, the dedup store, the Kafka cluster, the DLQ pipeline, monitoring, and alerting.

If you’re already using a managed streaming platform, webhook-to-Kafka ingestion may be available as a managed connector. For example, Streamkap provides a webhook source connector that handles receipt, dedup, and Kafka production without custom code.

Choosing the Right Approach for Your Team

The build-vs-buy decision depends on your scale and requirements:

Low volume (<1K events/day), single source: A simple webhook handler writing to your database might be enough. Kafka is overkill.
Medium volume, multiple sources: The architecture described here pays for itself in reliability and debuggability.
High volume or strict SLAs: Consider a managed solution that handles the infrastructure and gives you observability out of the box.

Regardless of approach, the principles remain: ACK immediately, buffer durably, process asynchronously, handle duplicates, and route failures to a DLQ. Get these right and webhook ingestion stops being a source of data loss.

Ready to stream webhook data into Kafka without building custom infrastructure? Streamkap’s managed webhook source connector handles ingestion, deduplication, and delivery to Kafka topics with zero custom code. Start a free trial or learn more about webhook-to-Kafka streaming.

Products

Capabilities

Streamkap for...

Use Cases

By Destination

Compare

Learn

Company