<--- Back to all resources

Engineering

February 25, 2026

8 min read

Field Mapping and Renaming in Streaming Pipelines

Learn how to map, rename, and reorganize fields in real-time streaming data. Align source schemas with destination conventions, handle naming conflicts, and standardize column names.

TL;DR: • Field mapping transforms source column names and structures to match destination schemas - renaming, reordering, dropping, and restructuring fields as data flows through the pipeline. • Common needs include converting camelCase to snake_case, resolving reserved word conflicts, dropping internal-only columns, and mapping between different schema conventions. • Implementing mapping in the pipeline ensures consistent naming across all destinations without modifying the source application.

Every data pipeline sits between two worlds. On one side is the source system - an application database, a message queue, an API - where column names were chosen by application developers to fit their domain model. On the other side is the destination - a data warehouse, a lakehouse, an analytics engine - where naming conventions, reserved words, and schema expectations are often completely different. The gap between those two worlds is where field mapping lives.

Field mapping is the process of transforming column names, reordering fields, dropping unnecessary data, and restructuring records as they move from source to destination. In batch ETL, this typically happens in a transformation layer that runs after data has been extracted. In streaming pipelines, the transformation happens in-flight, applied to every record as it passes through.

Getting field mapping right is one of those quiet fundamentals that separates a pipeline that “works” from one that actually serves the people who depend on it. When column names are consistent, predictable, and well-documented, analysts write queries faster, dashboards break less often, and nobody has to guess whether the customer ID column is called customerId, CustomerID, customer_id, or cust_id.

Common Mapping Operations

Field mapping encompasses a range of operations, from simple renames to structural transformations. Here are the ones you will use most often.

Rename

The most straightforward operation: change a column’s name from what the source calls it to what the destination expects. A MongoDB collection with firstName becomes first_name in Snowflake.

Drop

Remove columns that the destination does not need. Internal audit fields, application-specific metadata, or large text blobs that are irrelevant for analytics can be stripped out before data reaches the destination.

Reorder

Some destinations or downstream consumers expect columns in a specific order. Reordering ensures that the most important or most commonly queried fields appear first.

Add Prefix or Suffix

When merging data from multiple sources into a single destination table, prefixes and suffixes disambiguate columns. A name column from the customers source becomes customer_name; the same column from the vendors source becomes vendor_name.

Case Conversion

Automatically convert all column names from one case convention to another - camelCase to snake_case, PascalCase to lowercase, or any other transformation that can be applied uniformly.

Naming Convention Conflicts

The most common driver of field mapping is the mismatch between naming conventions across systems.

camelCase is the default in most JavaScript and Java ecosystems, and by extension in document databases like MongoDB: orderDate, lineItems, customerId.

snake_case is the convention in PostgreSQL, Python, and most data warehouses: order_date, line_items, customer_id.

PascalCase shows up frequently in .NET applications and SQL Server: OrderDate, LineItems, CustomerId.

When you stream data from a MongoDB source with camelCase fields into a Snowflake destination that expects snake_case, every single column name needs to be transformed. Doing this manually for a handful of columns is tedious but survivable. Doing it for hundreds of columns across dozens of tables is where automation becomes non-negotiable.

Beyond simple convention mismatches, there are structural conflicts. A source might nest data inside a JSON object (address.street, address.city), while the destination expects flattened columns (address_street, address_city). This kind of structural mapping is more involved than a simple rename but follows the same principle: the destination schema should not be dictated by the source’s internal data model.

Apache Flink SQL provides a natural way to express field mapping transformations. If your streaming pipeline uses Flink as its processing engine, column aliases and projection give you precise control over what reaches the destination.

Renaming Columns with Aliases

SELECT
  customerId   AS customer_id,
  firstName    AS first_name,
  lastName     AS last_name,
  orderDate    AS order_date,
  totalAmount  AS total_amount
FROM source_orders;

Each AS clause maps a source column name to its destination equivalent. This is explicit, readable, and easy to maintain.

Dropping Columns with SELECT

The simplest way to drop columns is to omit them from your SELECT list. If the source has 20 columns and the destination only needs 8, select only those 8:

SELECT
  customer_id,
  order_date,
  total_amount,
  status
FROM source_orders;
-- Columns like internal_audit_ts, debug_flags, etc. are excluded

Adding Computed or Constant Fields

SELECT
  id            AS order_id,
  amount * 100  AS amount_cents,
  'USD'         AS currency,
  NOW()         AS ingested_at
FROM source_orders;

Case Conversion in Nested Structures

When dealing with flattened JSON fields, you can rename during the projection:

SELECT
  `payload`.`orderId`     AS order_id,
  `payload`.`lineItems`   AS line_items,
  `payload`.`shipTo`.`zipCode` AS ship_to_zip_code
FROM source_events;

Practical Scenarios

MongoDB camelCase to Snowflake snake_case

A MongoDB users collection has documents like:

{
  "userId": "u-12345",
  "firstName": "Alice",
  "lastName": "Chen",
  "emailAddress": "alice@example.com",
  "createdAt": "2025-03-15T10:30:00Z",
  "isVerified": true,
  "lastLoginTs": "2025-06-01T14:22:00Z"
}

The destination Snowflake table should have columns: user_id, first_name, last_name, email_address, created_at, is_verified, last_login_ts. A convention-based auto-mapper handles this without per-column configuration by detecting camelCase boundaries and inserting underscores.

SQL Server dbo.Order to a Consolidated Orders Table

SQL Server databases frequently use schema-qualified names like dbo.Order with PascalCase columns: OrderId, OrderDate, CustomerId, TotalAmount. The word Order is a reserved word in virtually every SQL dialect, so the destination table needs a different name - typically orders. The columns need conversion from PascalCase to snake_case, and any column named with a reserved word (like Group, Index, or Select) needs careful handling.

Reserved Word Handling

Reserved words are one of the most frustrating causes of pipeline failures. A column named order, group, select, index, table, or user is perfectly valid in some systems but will cause syntax errors in others.

Different databases have different reserved word lists. user is reserved in PostgreSQL but not in MySQL. index is reserved in most SQL dialects. timestamp is reserved in some destinations but commonly used as a column name in source systems.

There are two strategies for handling reserved words:

Quoting: Wrap the column name in the destination’s quoting character - double quotes for PostgreSQL and Snowflake, backticks for MySQL and BigQuery, square brackets for SQL Server:

-- Snowflake / PostgreSQL
SELECT "order", "group", "timestamp" FROM events;

-- BigQuery / MySQL
SELECT `order`, `group`, `timestamp` FROM events;

Renaming: Map the column to a non-reserved name: order becomes order_name or order_value, group becomes group_name. This is the cleaner long-term approach because it eliminates the need for quoting everywhere the column is referenced.

The best practice is to rename proactively. Quoting works, but it creates a maintenance burden - every query, every dashboard, every dbt model that touches that column has to remember to quote it. Renaming it once in the pipeline saves that effort for every downstream consumer.

Column Projection

Column projection - selecting only the columns you need - is one of the most underused optimizations in streaming pipelines. Source tables often have columns that are purely operational: _internal_version, _kafka_offset, debug_payload, raw_request_body. Sending all of that to the destination costs storage, slows down queries, and clutters the schema.

The impact is particularly significant with wide tables. A table with 150 columns where analytics only uses 30 means 80% of the data being stored and indexed is waste. In a columnar warehouse like Snowflake or BigQuery, unused columns still consume storage and contribute to metadata overhead.

Drop columns as early in the pipeline as possible. If the transformation layer can strip unnecessary fields before data hits the message broker, you save on network transfer, serialization costs, and storage at every stage downstream.

Multi-Destination Mapping

Many organizations stream the same source data to multiple destinations - a data warehouse for analytics, a search index for full-text queries, and a cache for low-latency reads. Each destination may need different column names for the same source field.

For example, a customer_email field might need to be:

  • customer_email in Snowflake (snake_case convention)
  • customerEmail in Elasticsearch (JSON/camelCase convention)
  • email in Redis (short keys for memory efficiency)

This requires per-destination mapping configurations. The source schema stays the same, but each destination sink has its own mapping rules applied at the point of delivery. Streamkap supports this pattern by allowing independent transformations on each pipeline destination, so a single source feeds multiple sinks with different field configurations.

Automation: Convention-Based Auto-Mapping

Manually specifying a rename for every column in every table does not scale. Convention-based auto-mapping applies a set of rules to every column name automatically:

  • Case conversion: camelCase to snake_case for all fields
  • Prefix stripping: Remove prefixes like tbl_, col_, or f_ that were artifacts of the source system
  • Reserved word suffixing: Append _col or _field to any column that matches a reserved word list
  • Namespace flattening: Convert nested field paths like address.zipCode to address_zip_code

Auto-mapping handles the 90% case. You configure the convention once, and it applies to every table and every column. For the remaining 10% - columns that need a specific, non-obvious name - you add explicit overrides.

This two-tier approach (convention-based defaults plus explicit overrides) keeps the configuration manageable even for pipelines with hundreds of tables and thousands of columns.

Best Practices

Document every mapping. Maintain a mapping registry that records which source column maps to which destination column, who approved the mapping, and when it was last changed. This is your Rosetta Stone when debugging data quality issues six months from now.

Version control your mapping configuration. Treat mapping definitions like code. Store them in Git, review changes through pull requests, and maintain a changelog. When a mapping change causes a downstream issue, you need to be able to trace exactly what changed and when.

Establish naming standards early. Agree on a single destination naming convention (snake_case is the most widely compatible) and enforce it from day one. Retrofitting naming standards onto an existing warehouse with hundreds of tables is far more painful than setting the convention before the first table is created.

Test mappings with real data. A mapping that looks correct on paper can fail on edge cases - Unicode characters in column names, extremely long field names that exceed destination limits, or nested fields with dots in their keys. Run sample records through the pipeline and verify the output schema before going to production.

Handle nulls and defaults explicitly. When you drop a source column that the destination schema marks as NOT NULL, you need a default value. When you add a computed column, define what happens when the input fields are null. These edge cases are where silent data quality issues hide.

Monitor for schema drift after mapping. Field mapping does not replace schema evolution - it complements it. When a source adds a new column, your mapping configuration needs to account for it. Set up alerts for unmapped columns so new fields do not slip through without a deliberate decision about how (or whether) to include them in the destination.

The goal of field mapping is to make the destination schema serve its consumers, not mirror its sources. Source systems are built for applications. Destinations are built for analysis, reporting, and decision-making. The mapping layer is where you translate between those two purposes, and getting it right means every query written against the destination is cleaner, faster, and less error-prone.