Tutorials

How to Stream AWS SQL Server Data to Iceberg on AWS with Streamkap

AUTHOR BIO
Paul is the CEO and Co-Founder of Streamkap

August 20, 2025

Introduction

In today's fast-paced world, every millisecond counts. Relying on outdated data isn't just inefficient; it's a direct path to missed opportunities and competitive disadvantage. Traditional systems simply can't keep up, but there's a better way to unlock the true power of your information.

In this guide, we'll explore how to leverage your data hosted in SQL Server on AWS by streaming it to Apache Iceberg on S3. This modern architecture provides a robust, scalable foundation for advanced analytics and AI/ML workflows, offering features like ACID transactions for data reliability and schema evolution without disruption.

Streaming SQL Server data to Iceberg on S3 creates powerful opportunities for data sharing across your organization, establishing a robust data lake architecture that serves as the foundation for advanced AI/ML workflows. With Iceberg, you gain time-travel capabilities for historical analysis and optimized query performance on massive datasets.

This guide reveals how Streamkap redefines data mobility through effortless integration and performance. For engineers prioritizing zero-maintenance pipelines or executives demanding live business intelligence, discover how to harness SQL Server's full potential in Iceberg on S3 without code or complexity.

Guide sections:

Prerequisites You'll need accounts for AWS, and Streamkap.
Setting up a New RDS SQL Server From Scratch This step will help you to set up SQL Server inside AWS.
Configuring an Existing RDS SQL Server For existing SQL Server DB, you will have to modify its configuration to make it Streamkap-compatible
Preparing your Iceberg in S3 destination This step will guide you in preparing your S3 environment and establishing the foundation for your Iceberg tables.
Configuring Access for Existing Iceberg on S3 For an existing Iceberg setup on S3, you will need to adjust permissions and configurations to allow Streamkap to write and manage data.
Streamkap Setup Adding SQL Server as a source, adding Iceberg on S3 as a destination, and finally connecting them using a data pipe.

Ready to Supercharge Your Data Pipeline?

Prerequisites

To follow along with this guide, make sure you have these prerequisites covered:

  1. Streamkap Account: You'll need an active Streamkap account with admin or data admin privileges. If you don't have one, you can sign up here or request access from your admin.

  2. Amazon AWS Account: An active AWS account is essential. This account needs the necessary permissions to set up and manage an AWS RDS instance. It also requires permissions for Amazon S3 (to handle your Iceberg data buckets and objects) and AWS Glue Data Catalog if you plan to use it for Iceberg table metadata.

SQL Server Set-Up

Setting up a New RDS SQL Server from Scratch 

Note on IAM Permissions: To successfully set up a new AWS RDS SQL Server instance, ensure your IAM user or role has the necessary permissions for creating and managing RDS instances, including parameter groups, subnet groups, networking, security, monitoring, and optional features like encryption and tagging.

Step 1: Access the AWS Management Console

  • Log in to your AWS Management Console.
  • In the search bar, type "RDS" and select the "RDS" service from the results.

Step 2: Create a new RDS SQL Server  Instance

  • From the RDS dashboard, select a region for your RDS instance that aligns with your application's requirements.

Note: For seamless integration with Streamkap, choose one of the following regions:

  • Oregon (us-west-2)
  • N. Virginia (us-east-1)
  • Europe (Ireland) (eu-west-1)
  • Asia Pacific (Sydney) (ap-southeast-2)

  • Navigate to the "Databases" section or select "DB Instances" under "Resources."
  • Click on "Create database" to begin the setup.
  • In the "Choose a database creation method" section, select "Standard create."
  • Under the "Engine options" section, select "SQL Server" as the database engine.
  • For "Database management type," select "Amazon RDS" to leverage the fully managed service.
  • Choose the “Standard Edition” for cost-effective, small-to-medium workloads with essential features, or 
  • Opt for the “Enterprise Edition” for high scalability, advanced security, and complex workloads that require features like advanced analytics and high availability.
  • Select “SQL Server version 2016 or higher” to ensure compatibility and optimal performance.
  • If this is your first time setting up, consider choosing a "Dev/Test environment" for initial configuration to ensure everything is set up correctly.
  • For your database identifier name, we recommend using sourcedatabase to follow standard naming conventions, which helps with clarity and consistency. However, feel free to choose a name that's more personalized to your setup if you prefer.
  • Next, choose a master username that's easy to recall and fits your configuration. For the password, opt for a "Self-managed password," then enter a strong, secure one. Be sure to save it in a safe place, as you'll need it again later.
  • If you're new to this and looking to gain hands-on experience before deploying live, we suggest selecting "memory optimized classes (r and x classes)" and picking db.r5.large as your instance type. This offers a cost-effective solution suitable for testing.
  • For storage, choose "General Purpose SSD" for a good balance of performance and cost. If your project has specific needs, you can certainly select a different storage option that better meets those requirements.
  • Under "Availability and Durability," select "No" if you don't require high availability or failover capabilities at this stage. 
  • To simplify your setup and concentrate solely on the database instance, choose "Don't connect EC2."
  • In the VPC settings, opt to "Create new VPC," then use the dropdown menu to set up a new subnet for streamlined connectivity. 
  • For easy management and access, enable local access by clicking "Yes" for public access in the Connectivity section.
  • Alternatively, if you prefer a simpler setup, head to the VPC settings section, select "Choose existing VPC," and use the "default VPC" for a hassle-free configuration.
  • Uncheck “Enable Enhanced monitoring”—to reduce complexity.
  • Once you’ve reviewed all the settings and configurations, take a final look at everything, and then click on “Create Database” to initiate the creation process.
  • You'll then need to wait for the database status to change to "Available."
Database status

Step 3: Configuring Your SQL Server Instance for Streamkap Compatibility

Note: You can execute the following SQL queries using local tools like Azure Data Studio or DBeaver.

  • To align with naming conventions, create a new database named source-database by running this query:

```sql

CREATE DATABASE [source-database];

```

  • Verify the database creation by executing the query below. A successful creation will display the database's name:

```sql

SELECT name FROM sys.databases WHERE name = 'source-database';

```

To enable Change Data Capture (CDC) on your source-database, allowing Streamkap to track and capture all changes for replication, run the following command:

```sql

EXEC msdb.dbo.rds_cdc_enable_db 'source-database';

```

  • To confirm that CDC has been successfully enabled, execute this query:

```sql

SELECT name, is_cdc_enabled
FROM sys.databases
WHERE name = 'source-database';

```

  • The output should confirm CDC's status for the source-database, matching the expected result (e.g., is_cdc_enabled will be 1).
CDC verification

Configuring an Existing RDS SQL Server for Streamkap Compatibility

If you're using an existing AWS RDS SQL Server instance, it might not be configured for Change Data Capture (CDC), which is essential for Streamkap. You'll need to verify and enable CDC if it's not already active.

Note: For all queries below, remember to replace "your-database-name" with the actual name of your database

```sql

SELECT name, is_cdc_enabled
FROM sys.databases
WHERE name = 'your-database-name';

```

If the is_cdc_enabled column returns 1, CDC is already active, and you can skip the rest of this section.

Enable CDC on database

  • If CDC is not enabled, run the following query to activate Change Data Capture on your database. This will allow Streamkap to track and capture all data changes for replication purposes:

```sql

EXEC msdb.dbo.rds_cdc_enable_db 'your-database-name';

```

  • confirm that CDC has been successfully enabled, execute the following query once more:

```sql

SELECT name, is_cdc_enabled
FROM sys.databases
WHERE name = 'your-database-name';
  • The output should now show 1 for is_cdc_enabled, indicating that CDC is active for your database.
CDC verification

Configuring RDS SQL Server for Streamkap Integration

  • Access your RDS SQL Server database using your preferred tool, such as Azure Data Studio or DBeaver, and execute the following SQL script. 
  • This script will set up a new user and role with the necessary permissions for the Streamkap connector to interact with your database for streaming and integration tasks.

```sql

-- Switch to the 'source-database'
USE [source-database];
GO


-- Create a login for 'streamkap_user' with a strong password (replace 'your_password_here')
CREATE LOGIN streamkap_user WITH PASSWORD = 'your_password_here';
GO


-- Create a user 'streamkap_user' within 'source-database' linked to the login
CREATE USER streamkap_user FOR LOGIN streamkap_user;
GO


-- Create a custom role named 'streamkap_role'
CREATE ROLE streamkap_role;
GO


-- Add 'streamkap_user' to the 'streamkap_role'
ALTER ROLE streamkap_role ADD MEMBER streamkap_user;
GO


-- Grant SELECT permissions on the entire 'source-database' to 'streamkap_role'
GRANT SELECT ON DATABASE::[source-database] TO streamkap_role;
GO


-- Grant ALTER and SELECT permissions on the 'music_data' schema to 'streamkap_role'
GRANT ALTER, SELECT ON SCHEMA::music_data TO streamkap_role;
GO


-- Grant CREATE TABLE permissions to 'streamkap_role'
GRANT CREATE TABLE TO streamkap_role;
GO

```

  • To create a new schema and table, which can be used to verify successful data transfer later, execute the following queries:

```

  • To enable the Streamkap connector to perform "Snapshots" for initial data backfilling, you must create a dedicated table for this process. 
  • Run the following query to create this required table.

Note: Ensure the table name is exactly "STREAMKAP_SIGNAL". Any variation will not be recognized

```sql

CREATE SCHEMA music_data;


-- Create an 'Musci' table within the 'music_data' schema
CREATE TABLE music_data.Music (
    artist_band VARCHAR(255) NOT NULL,
    song VARCHAR(255) NOT NULL
);




-- Insert data into the music_data.Music table
INSERT INTO music_data.Music (artist_band, song) VALUES ('Ed Sheeran', 'Shape of you');
INSERT INTO music_data.Music (artist_band, song) VALUES ('Queen', 'Bohemian Rhapsody');
INSERT INTO music_data.Music (artist_band, song) VALUES ('Back Street Boys', 'Show me the meaning of being lonely');
INSERT INTO music_data.Music (artist_band, song) VALUES ('Blue', 'One Love');
GO

```

  • To enable the Streamkap connector to perform "Snapshots" for initial data backfilling, you must create a dedicated table for this process. 
  • Run the following query to create this required table.

Note: Ensure the table name is exactly "STREAMKAP_SIGNAL". Any variation will not be recognized

```sql

-- Create the 'streamkap' schema if it doesn't exist
IF NOT EXISTS (SELECT * FROM sys.schemas WHERE name = 'streamkap')
BEGIN
    EXEC('CREATE SCHEMA streamkap');
END;




-- Create the 'streamkap_signal' table within the 'streamkap' schema
CREATE TABLE streamkap.streamkap_signal (
    id VARCHAR(255) PRIMARY KEY,
    type VARCHAR(32) NOT NULL,
    data VARCHAR(2000) NULL
);




-- Grant necessary privileges (SELECT, UPDATE, INSERT) on 'streamkap.streamkap_signal' to 'streamkap_role'
GRANT SELECT, UPDATE, INSERT ON streamkap.streamkap_signal TO streamkap_role;

```

Note: Microsoft recommends storing CDC data files separately from primary database files. However, in RDS SQL Server, direct access to file paths is not available.

  • Execute the following query to enable CDC on your source tables
  • Make sure to replace {schema} and {table} with the correct schema and table names. In our case, the schema is “music_data”, and the table is “Music”.

```sql

EXEC sys.sp_cdc_enable_table  
    @source_schema        = N'music_data', 
    @source_name          = N'Music', 
    @role_name            = N'streamkap_role',
    @supports_net_changes = 0;
GO

```

  • For enabling CDC on the STREAMKAP_SIGNAL table, use:

```sql

EXEC sys.sp_cdc_enable_table  
@source_schema        = N'streamkap',
@source_name          = N'streamkap_signal',
@role_name            = N'streamkap_role',
@supports_net_changes = 0;
GO

```

Iceberg set up

Whether you're new to Iceberg or already familiar with data lake technologies, this guide will walk you through the essentials. From setting up your environment to configuring your storage and catalog, we’ll ensure a smooth and efficient experience as you begin working with Apache Iceberg for reliable, scalable data management.

Preparing your Iceberg in S3 destination

Step 1:  Log in to the AWS Management Console

  • Log in to your AWS Management Console and select the region you specified in the above section.

Let’s create a bucket now:

  • Go to the search bar, type “S3,” and select S3.
  • Now click “Create bucket”.

  • Give your S3 bucket a unique name that’s easy to identify later.

  • Public Access: Uncheck all public access options to avoid giving public permissions. Since the access is via third-party services, it’s better to uncheck these to prevent networking or configuration issues.
  • Next, let the other settings ad default and click ‘Create bucket’ to proceed with the setup.

Step 2: Create new AWS Glue database

  • Type "Glue" in the search bar and click on "AWS Glue" from the results.
  • Select the same region that we have while creating S3 bucket,  in this case we preferred us-west-2.
  • From the left side navigation bar navigate to the database to create database for our data.
Inserting image...
  • Click on "Add database" as depicted in the picture below.
  • Provide a unique database name and include a clear, descriptive explanation to define its purpose within AWS Glue.
  • Then, click on “Create database” to finalize the creation.
  • Once the database is successfully created, it will appear in your list of databases—just like what is shown in the screenshot below.

Step 3: Create new Policy and IAM role for Streamkap connection 

  • Type “IAM” in the search bar, then select IAM section.
  • Navigate to Policies in the left menu and click Create Policy.
  • Search for S3 and select AmazonS3FullAccess as a base policy.
  • Permission Requirement for S3 and Glue:
    • "s3:GetObject", 
    • "s3:PutObject", 
    • "s3:DeleteObject", 
    • "s3:ListBucket"
    • "glue:GetDatabase", 
    • "glue:GetDatabases", 
    • "glue:CreateTable", 
    • "glue:GetTable", 
    • "glue:GetTables", 
    • "glue:UpdateTable", 
    • "glue:DeleteTable", 
    • "glue:CreatePartition", 
    • "glue:GetPartition", 
    • "glue:GetPartitions", 
    • "glue:BatchCreatePartition", 
    • "glue:UpdatePartition", 
    • "glue:DeletePartition", 
    • "glue:BatchDeletePartition"

  • It will first display the visual format of the policy. You can either select the policy you want or toggle to the JSON format and paste the following policy for S3 and AWS Glue.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:CreateTable",
                "glue:GetTable",
                "glue:GetTables",
                "glue:UpdateTable",
                "glue:DeleteTable",
                "glue:CreatePartition",
                "glue:GetPartition",
                "glue:GetPartitions",
                "glue:BatchCreatePartition",
                "glue:UpdatePartition",
                "glue:DeletePartition",
                "glue:BatchDeletePartition"
            ],
            "Resource": [
                "arn:aws:glue:<your-region>:<your-account-id>:catalog",
                "arn:aws:glue:<your-region>:<your-account-id>:database/<your-database-name>",
                "arn:aws:glue:<your-region>:<your-account-id>:table/*/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::<your-s3-bucket-name>",
                "arn:aws:s3:::<your-s3-bucket-name>/*"
            ]
        }
    ]
}

  • Click “Next”.
  • Plug in Policy Name as Steramkap_warehouse_policy  (or any name of your choice)
  • Plug in Description as This policy will be used to put data into S3

  • Click on "Create Policy" which can be found in the bottom right corner.
  • On the left side  navigate to “Roles and click on the  “Create role” on the top right corner.

  • Select Custom Trust policy and enter policy as shows below for Streamkap to allows the Streamkap production tenant access to your AWS Glue instance.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::300973880807:root"
            },
            "Action": "sts:AssumeRole",
            "Condition": {}
        }
    ]
}

  • Next, under Filter by type choose the Custom Managed Policy option. Then, select the policy you just created and click Next to proceed.
  • Provide a meaningful name and a detailed description for the role to help identify its purpose easily in the future. After carefully reviewing all the policy settings to ensure they meet your requirements, click on the “Create role” button to finalize the role creation process.
  • After the role is created, it will appear as shown in the picture below, confirming that the setup was successful and the role is ready for use.

Navigate to the IAM role  and copy its Role ARN. We’ll use this ARN when setting up the connection with Streamkap.

Step 4 : Add role to AWS glue

  • Type "Glue" in the search bar and click on "AWS Glue" from the results.
  • Click on “Set up roles and users” to grant AWS Glue access and assign a default IAM role for your account.
  • Select the role you created earlier and click Next to continue.
  • Choose the specific S3 bucket you want AWS Glue to access, grant it read-only permissions, and then click Next to proceed.

Select the role you created earlier to set it as the default IAM role, then click Next to continue.

  • Review all the changes carefully, then click on Apply changes to finalize the configuration.

Configuring Access for an Existing Iceberg on S3 Destination

Step 1: Create new Policy and IAM role for Streamkap connection 

  • Type “IAM” in the search bar, then select IAM section.
  • Navigate to Policies in the left menu and click Create Policy.
  • Search for S3 and select AmazonS3FullAccess as a base policy.
  • Permission Requirement for S3 and Glue:
    • "s3:GetObject", 
    • "s3:PutObject", 
    • "s3:DeleteObject", 
    • "s3:ListBucket"
    • "glue:GetDatabase", 
    • "glue:GetDatabases", 
    • "glue:CreateTable", 
    • "glue:GetTable", 
    • "glue:GetTables", 
    • "glue:UpdateTable", 
    • "glue:DeleteTable", 
    • "glue:CreatePartition", 
    • "glue:GetPartition", 
    • "glue:GetPartitions", 
    • "glue:BatchCreatePartition", 
    • "glue:UpdatePartition", 
    • "glue:DeletePartition", 
    • "glue:BatchDeletePartition"

  • It will first display the visual format of the policy. You can either select the policy you want or toggle to the JSON format and paste the following policy for S3 and AWS Glue.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:CreateTable",
                "glue:GetTable",
                "glue:GetTables",
                "glue:UpdateTable",
                "glue:DeleteTable",
                "glue:CreatePartition",
                "glue:GetPartition",
                "glue:GetPartitions",
                "glue:BatchCreatePartition",
                "glue:UpdatePartition",
                "glue:DeletePartition",
                "glue:BatchDeletePartition"
            ],
            "Resource": [
                "arn:aws:glue:<your-region>:<your-account-id>:catalog",
                "arn:aws:glue:<your-region>:<your-account-id>:database/<your-database-name>",
                "arn:aws:glue:<your-region>:<your-account-id>:table/*/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::<your-s3-bucket-name>",
                "arn:aws:s3:::<your-s3-bucket-name>/*"
            ]
        }
    ]
}

  • Click “Next”.
  • Plug in Policy Name as Steramkap_warehouse_policy  (or any name of your choice)
  • Plug in Description as This policy will be used to put data into S3

  • Click on "Create Policy" which can be found in the bottom right corner.
  • On the left side  navigate to “Roles and click on the  “Create role” on the top right corner.

  • Select Custom Trust policy and enter policy as shows below for Streamkap to allows the Streamkap production tenant access to your AWS Glue instance.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::300973880807:root"
            },
            "Action": "sts:AssumeRole",
            "Condition": {}
        }
    ]
}

  • Next, under Filter by type choose the Custom Managed Policy option. Then, select the policy you just created and click Next to proceed.
  • Provide a meaningful name and a detailed description for the role to help identify its purpose easily in the future. After carefully reviewing all the policy settings to ensure they meet your requirements, click on the “Create role” button to finalize the role creation process.
  • After the role is created, it will appear as shown in the picture below, confirming that the setup was successful and the role is ready for use.

Navigate to the IAM role  and copy its Role ARN. We’ll use this ARN when setting up the connection with Streamkap.

Step 2: Add role to AWS glue

  • Type "Glue" in the search bar and click on "AWS Glue" from the results.
  • Click on “Set up roles and users” to grant AWS Glue access and assign a default IAM role for your account.
  • Select the role you created earlier and click Next to continue.
  • Choose the specific S3 bucket you want AWS Glue to access, grant it read-only permissions, and then click Next to proceed.

Select the role you created earlier to set it as the default IAM role, then click Next to continue.

  • Review all the changes carefully, then click on Apply changes to finalize the configuration.

Streamkap Setup 

To connect Streamkap to SQL Server, we need to ensure the database is configured to accept traffic from Streamkap by safelisting Streamkap’s IP addresses.

Note: If SQL Server  accepts traffic from anywhere from the world you can move on to the “Configuring  SQL server for Streamkap Integration” section.

Safelisting Streamkap’s IP Address

Streamkap’s dedicated IP addresses are:

Service IPs
Oregon (us-west-2) 52.32.238.100
North Virginia (us-east-1) 44.214.80.49
Europe Ireland (eu-west-1) 34.242.118.75
Asia Pacific - Sydney (ap-southeast-2) 52.62.60.121

When signing up, Oregon (us-west-2) is set as the default region. If you require a different region, let us know. For more details about our IP addresses, please visit this link.

  • To safelist one of our IPs, access your RDS SQL Server instance's security group with type “CIDR/IP - Inbound” and select “Edit inbound rules.”
  • Click “Add rule,” choose “SQL Server-RDS” as the type, select “Custom” as the source, and enter any of our relevant IP addresses followed by “/32.” For example, to add an IP address for North Virginia you would enter “52.32.238.100/32” as shown in the image below.


Adding RDS SQL Server Database as a Source Connector

Before adding SQL Server RDS as a source connector, ensure the following:

  • IP Address Safelisting: Ensure that your IP address is added to the safelist, or configure your SQL Server RDS instance to allow traffic from Streamkap by default. This step is essential for establishing a secure connection between Streamkap and your RDS instance.
  • SQL Configuration: Verify that you have executed all required SQL commands from the "Configuring AWS RDS SQL Server for Streamkap Integration" section. This includes setting up users, roles, permissions, and enabling Change Data Capture (CDC) on the required tables.

Both of these steps are critical for establishing a seamless connection between AWS RDS SQL Server and Streamkap.

Note: If you're using SQL Server 2016 or later, you can integrate with Streamkap using CDC. For older versions, additional configuration may be required.

Step 1: Log in to Streamkap

  • Sign in to the Streamkap. This will direct you to your dashboard.

Note: You need admin or data admin privileges to proceed with the steps below.

Step 2: Set Up a SQL Server  Source Connector

  • On the left sidebar, click on “Connectors,” then select the “Sources” tab. 
  • Click the “+ Add” button to begin setting up your SQL Server source connector.
  • Enter "SQL Server" in the search bar and select "SQL Server RDS."
  • This will take you to the new connector setup page.

  • You’ll be asked to provide the following details:


    • Name: Choose a name for the SQL Server source in Streamkap.
    • Endpoint: Copy the Endpoint from the “Connectivity & Security” tab of your SQL Server RDS instance.
    • Port: The default port is 1433, If you’ve changed the port, update it accordingly.
    • Signal Table Schema: Enter “STREAMKAP” if you followed the previous step or provide your own schema name.
    • Username: Use “STREAMKAP_USER” which was created earlier.
    • Password: Enter the password for the user.
    • Source Database: Use “source-database” (or specify a custom database name if necessary).

If you choose not to use an SSH Tunnel, click "Next" and proceed to step 3.

If you opt to connect to your AWS RDS SQL Server through an SSH Tunnel, ensure the following:

  • Your bastion host / SSH Tunnel server is connected to your SQL Server RDS instance.
  • Your bastion host /SSH Tunnel allow traffic from any source or has the Streamkap IP Address safelisted

Log in to your Linux SSH Tunnel host and execute the commands below. If you are using a Windows server, refer to our SSH Tunnel guide for Windows.

  • ‍After running the commands above, fetch the “SSH Public Key” from Streamkap’s SQL Server  source as shown below.
-- Create group Streamkap:  
 sudo groupadd Streamkap  
 
-- Create user Streamkap:  
 sudo useradd -m -g Streamkap Streamkap  
 
-- Switch to the Streamkap user:  
 sudo su - Streamkap  
 
-- Create the .ssh directory:  
 mkdir ~/.ssh  
 
-- Set permissions:  
 chmod 700 ~/.ssh  
 
-- Change to the .ssh directory:  
 cd ~/.ssh  
 
--Create the authorized_keys file:  
 touch authorized_keys  
 
-- Set permissions:  
 chmod 600 authorized_keys  
Fetching public access key
  • Replace <key> with the SSH Public key copied from earlier and run the code below.
-- Using the key given to you from Streamkap, add this to the authorized_keys file.  
 echo "<key>" >> ~/.ssh/authorized_keys  
 
-- Set the SSH daemon configuration for Port Forwarding:  
 sudo nano /etc/ssh/sshd_config  
 AllowTcpForwarding yes  
 
 --Restart the SSH Service  
 sudo systemctl restart sshd

After your SSH Tunnel server is restarted, enter the public IP address of your SSH Tunnel server as the SSH Host in Streamkap and click Next. You’ll then be directed to the schema and table selection page, like the one below.

Step 3: Plug in Schema and Table Names

If you've followed the guide, your SQL server should now have a "music_data" schema within the "Music" table.

  • If you're using a different schema, plug in your own details.
  • Once you've updated click Save to proceed.
  • Once all the steps are completed correctly, you should be able to successfully create a SQL Server source connector as shown below.

In case you encounter any fatal errors, they could be caused by one of the following:

  • Our IP addresses were not safelisted.
  • Incorrect credentials.
  • Incorrect database, schema, or table name.
  • Insufficient permissions to query the table.

Adding Iceberg as a destination Connector

  • From the side navigation bar, go to Connectors, then switch to the Destinations tab. Search for Iceberg in the list and select it to begin the setup process.
  • Enter streamkap_datawarehouse as the name of the destination — if you’ve followed the naming conventions from this guide.
  • Select glue as the catalog type to match the setup we've configured earlier.
  • Paste the IAM Role ARN (arn:aws:iam::...:role/streamkap_role) that you created in the previous steps.
  • Choose us-west-2 as the region — assuming you're using the same region as outlined in this guide.
  • Enter s3a://streamkapatawarehouse as the S3 bucket path used in the earlier setup.
  • Set the schema name to streamkap_warehouse, which corresponds to the Glue database we created.
  • Select insert as the insert mode to define how records will be written into Iceberg.
  • Click “Save” to complete the destination setup.

  • Once the configuration has been completed successfully, proceed to verify that the active status is properly reflected and functioning as expected.

Adding Streamkap pipeline

  • Navigate to “Pipelines” on the left side and then press “+Create” in order to create a pipeline between source and destination.
  • Select source and destination and click on “Next” on the bottom right corner.
  • Select all the schemas or the specific schema you wish to transfer.
  • Assign a name to the pipeline, add the appropriate tags as needed, enable the snapshot feature for the topic, and then click on the save button to apply the changes.

  • Once the pipeline is successfully created, it should appear like the screenshot below, with its status displayed as “Active.”

  • Download the files from the S3 bucket, convert the data into a CSV or table format, and then verify the accuracy of the resulting data.

By design Your “Table” in the destination will have the following meta columns apart from the regular columns.

Column Name Description
_STREAMKAP_SOURCE_TS_MS Timestamp in milliseconds for the event within the source log
_STREAMKAP_TS_MS Timestamp in milliseconds for the event once Streamkap receives it
__DELETED Indicates whether this event/row has been deleted at source
_STREAMKAP_OFFSET This is an offset value in relation to the logs we process. It can be useful in the debugging of any issues that may arise

What's Next? 

Thank you for reading this guide. If you have other sources and destinations to connect in near real-time check out the following guides. 

For more information on connectors please visit here