What is the difference between tool use and streaming context for AI agents?

Tool use means the agent calls a function that queries a database or API in real time. Streaming context means a CDC pipeline keeps a cache or read replica updated, and the agent reads from that pre-populated store. Tool use gives perfectly current data; streaming context gives sub-millisecond reads with data that's 1-5 seconds behind.

Which approach has lower latency for agent lookups?

Streaming context. Reading from a local cache (Redis, in-memory store) takes under 1ms. A direct database query typically takes 50-200ms including connection overhead and network round-trip. Over multiple tool calls per agent interaction, the difference compounds.

Does streaming context add load to my production database?

No. The CDC pipeline reads from the database's transaction log, which adds minimal overhead. Once changes are captured, all downstream reads happen against the cache, not the source database. This is one of the biggest advantages for high-volume agent workloads.

Can I combine both approaches?

Yes, and many production systems do. Use streaming context for read-heavy lookups (customer info, order history, product catalog) and tool use for operations that need transactional consistency (placing an order, updating a record) or that require complex ad-hoc queries.

<--- Back to all resources

AI & Agents

May 22, 2025

11 min read

Agent Tool Use vs. Streaming Context: Two Ways to Give Agents Fresh Data

Compare tool-use (direct DB queries) vs. streaming context (CDC-fed caches) for AI agents — latency, cost, failure modes, and when to use each.

Every AI agent that interacts with business data faces the same question: how does it get the information it needs?

There are two fundamental approaches, and they have very different trade-offs. Understanding when to use each — and how to combine them — is the difference between an agent that works in a demo and one that works at scale.

Approach 1: Tool Use (Direct Queries)

The agent calls a function. That function queries a database, calls an API, or runs a computation. The result comes back to the agent in the same turn.

# Tool use: agent calls this function directly
def get_customer(customer_id: int) -> dict:
    conn = get_db_connection()
    cursor = conn.cursor()
    cursor.execute(
        "SELECT id, name, email, plan, mrr_cents "
        "FROM customers WHERE id = %s", (customer_id,)
    )
    row = cursor.fetchone()
    conn.close()
    return dict(zip(["id", "name", "email", "plan", "mrr_cents"], row))

This is the pattern in every LLM function-calling tutorial. It’s simple, it’s direct, and the data is always current.

Where Tool Use Works Well

Low agent volume: A handful of agents making occasional lookups
Write operations: The agent needs to update a record, not just read it
Complex queries: Ad-hoc joins, aggregations, or filters that change per request
Transactional reads: The agent needs to read-then-write atomically
Prototyping: Get something working before optimizing

Where It Breaks Down

The problem shows up at scale. Every tool call is a database query. An agent interaction might involve 3-8 tool calls. Multiply by concurrent users:

Concurrent agents	Tool calls/interaction	Queries/second on source DB
10	5	~50
100	5	~500
1,000	5	~5,000
10,000	5	~50,000

At 10 agents, it’s nothing. At 1,000, you’re adding significant load to your production database — the same database serving your application. At 10,000, you likely need dedicated read replicas just for agent traffic.

There’s also latency. Each tool call has to:

Acquire a database connection (or get one from a pool)
Send the query over the network
Wait for the database to execute it
Receive the result
Return it to the agent

That’s 50-200ms per call. Over 5 calls, the agent spends 250ms-1s just waiting for data, not counting LLM inference time.

Approach 2: Streaming Context (CDC-Fed Store)

Instead of querying on demand, you keep a read-optimized store continuously updated with CDC. The agent reads from this store, which is designed for the exact access patterns the agent needs.

# Streaming context: agent reads from pre-populated cache
import redis

r = redis.Redis(host="localhost", port=6379, decode_responses=True)

def get_customer(customer_id: int) -> dict:
    data = r.hgetall(f"customer:{customer_id}")
    return {
        "id": int(data["id"]),
        "name": data["name"],
        "email": data["email"],
        "plan": data["plan"],
        "mrr_cents": int(data["mrr_cents"]),
    }

The data gets there through a pipeline:

Source DB → CDC → Kafka → Consumer → Redis (or any read store)

The consumer watches for changes and updates the cache within seconds. By the time the agent asks for data, it’s already there.

Where Streaming Context Works Well

High agent volume: The cost doesn’t scale with the number of agents
Read-heavy access patterns: Customer lookups, order history, product catalog
Low-latency requirements: Sub-millisecond reads every time
Multiple consumer pattern: The same data feeds agents, dashboards, and APIs
Source DB protection: Zero additional load on production databases

Where It Has Limitations

Data freshness: 1-5 seconds behind the source, depending on pipeline configuration
Fixed access patterns: You need to design the cache schema upfront
Infrastructure cost: Kafka + cache + CDC pipeline is more to run than a database connection
Setup complexity: More moving parts than a direct query

Side-by-Side Comparison

Here’s how the two approaches compare across the dimensions that matter in production:

Latency

Operation	Tool Use	Streaming Context
Single lookup	50-200ms	<1ms
5 lookups per interaction	250ms-1s	<5ms
Complex join	100-500ms	<1ms (pre-computed)

Streaming context wins on read latency by 2-3 orders of magnitude. The trade-off is a 1-5 second delay before new data appears in the cache.

Source Database Load

Agents	Tool Use (queries/sec)	Streaming Context (queries/sec)
10	~50	0
100	~500	0
1,000	~5,000	0

With streaming context, the source database load is constant regardless of agent count. CDC reads from the write-ahead log, which the database produces anyway. For more on CDC’s impact on source databases, see What Is Change Data Capture.

Data Freshness

Tool use gives you the current state at the moment of query. Streaming context gives you the state as of a few seconds ago:

Timeline:
  t=0s    Record updated in source DB
  t=0s    Tool use sees the new value immediately
  t=1-5s  CDC event arrives in cache
  t=1-5s  Streaming context now sees the new value

For most agent use cases — customer support, order lookup, inventory checks — a 1-5 second delay is imperceptible. For financial trading or real-time bidding, it matters.

Failure Modes

This is where the approaches differ most:

Tool use failure modes:

Database is down → agent cannot answer the question
Database is slow → agent response is slow
Connection pool exhausted → agent requests fail
Network partition → agent is completely blind

Streaming context failure modes:

CDC pipeline stops → cache serves last-known data (stale but functional)
Cache is down → agent needs a fallback (direct query or error)
Kafka is down → new changes don’t flow, but cache still serves reads
Source DB is down → cache continues serving, no impact on agents

The streaming approach degrades gracefully. When something breaks in the pipeline, the cache keeps serving the last-known good data. The agent keeps working — it just doesn’t see changes made during the outage.

Tool use fails hard. When the database is down, every agent interaction that needs data fails completely.

Cost at Scale

For a concrete comparison, consider 1,000 concurrent agents making 5 lookups each, with 100 interactions per hour:

Tool use:

500,000 queries/hour on source DB
Need larger DB instance or read replicas to handle the load
Estimated cost: $500-2,000/month for additional database compute

Streaming context:

CDC pipeline: fixed cost regardless of agent count
Redis cache: ~$50-200/month for a small cluster
Kafka: shared with other consumers (amortized)
Estimated cost: $200-500/month total

The streaming approach has a higher fixed cost (you need the CDC pipeline and cache infrastructure) but near-zero marginal cost per agent. Tool use has low fixed cost but scales linearly with agent volume.

Hybrid: Combining Both Approaches

In practice, most production agent systems use both. The key is choosing the right approach for each type of operation:

# Hybrid agent tools

import redis
import psycopg2

r = redis.Redis(host="localhost", port=6379, decode_responses=True)

# READ operations → streaming context (fast, no DB load)
def get_customer(customer_id: int) -> dict:
    """Read from CDC-fed cache. <1ms, zero DB load."""
    return r.hgetall(f"customer:{customer_id}")

def get_order_history(customer_id: int) -> list:
    """Read from CDC-fed cache. Pre-indexed by customer."""
    order_ids = r.smembers(f"customer:{customer_id}:orders")
    pipe = r.pipeline()
    for oid in order_ids:
        pipe.hgetall(f"order:{oid}")
    return pipe.execute()

# WRITE operations → direct tool use (transactional)
def update_order_status(order_id: int, new_status: str) -> dict:
    """Write directly to the source DB. Needs transactional guarantee."""
    conn = psycopg2.connect("postgresql://...")
    cur = conn.cursor()
    cur.execute(
        "UPDATE orders SET status = %s, updated_at = NOW() "
        "WHERE id = %s RETURNING id, status",
        (new_status, order_id)
    )
    result = cur.fetchone()
    conn.commit()
    conn.close()
    # CDC will propagate this change to the cache within seconds
    return {"id": result[0], "status": result[1]}

# COMPLEX QUERIES → direct tool use (ad-hoc)
def search_orders(filters: dict) -> list:
    """Complex search with dynamic filters — use the DB."""
    conn = psycopg2.connect("postgresql://...")
    # Build dynamic query based on filters...
    # This is where a DB's query planner earns its keep

The rule of thumb: reads from cache, writes through the database, complex queries against the database.

After a write operation, the agent can tell the user “Your order status has been updated” with confidence — the write went directly to the database. The CDC pipeline will propagate the change to the cache within seconds, so subsequent reads will reflect it.

Decision Framework

Use this to decide which approach (or combination) fits your use case:

Use Tool Use (Direct Queries) When:

Agent volume is low (<100 concurrent)
You need transactional consistency (read-modify-write)
Queries are ad-hoc and unpredictable
You’re prototyping and need to move fast
Data must be perfectly current (zero staleness tolerance)

Use Streaming Context When:

Agent volume is high or expected to grow
Access patterns are known and repeated (lookup by ID, list by entity)
Source database is already under load and can’t absorb more
You need sub-millisecond response times
The same data feeds multiple consumers (agents, dashboards, APIs)
Graceful degradation matters more than perfect freshness

Use Both When:

You have a mix of reads and writes
Some operations need transactional guarantees, others don’t
You want to protect the source DB from read load while still supporting writes
You’re building a production system that needs to scale

For a deeper exploration of how agents consume real-time data, see Real-Time Data for AI Agents. For the broader architecture of keeping agent context fresh, see Context Layer for AI Agents.

Implementation Path

If you’re starting from scratch:

Start with tool use. Get the agent working with direct queries. Validate the use case before adding infrastructure.
Identify hot paths. Which tool calls happen most frequently? Which ones add the most load? Use logging and metrics to find out.
Move hot reads to streaming context. Set up CDC for the tables behind your most-called read tools. Build a cache consumer. Swap the tool implementation from DB query to cache read.
Keep writes as tool use. There’s rarely a reason to change write operations — they need to go to the source of truth.
Add monitoring. Track cache freshness, hit rates, and fallback frequency. Alert when the CDC pipeline falls behind.

This incremental approach lets you get value from both patterns without a large upfront investment. You add streaming context where the data justifies it and keep direct queries where they’re simpler and sufficient.

The Cost of Doing Nothing

The hidden cost of the pure tool-use approach is that it creates a coupling between agent adoption and database capacity. Every new agent use case means more load on the source database. At some point, the DBA starts pushing back on new agent features because the database can’t absorb more read traffic.

Streaming context breaks that coupling. The CDC pipeline captures changes once, and any number of downstream consumers — agents, dashboards, search indexes, caches — read independently. Adding a new agent use case means adding a new cache read pattern, not a new database query.

Ready to give your agents fast, reliable context without loading your source databases? Streamkap streams CDC events from PostgreSQL, MySQL, MongoDB, and more into any downstream store, so your agents get sub-millisecond reads with data that’s seconds fresh. Start a free trial or learn more about Streamkap for AI agents.

Products

Capabilities

Streamkap for...

Use Cases

By Destination

Compare

Learn

Company