Tutorials

How to Stream Data From AWS RDS MySQL

AUTHOR BIO
Ricky has 20+ years experience in data, devops, databases and startups.

January 30, 2025

Introduction

In today’s fast-paced business world, speed isn’t just an advantage—it’s a necessity. Decisions need to be made in real-time, but let’s face it: traditional data processing is clunky, slow, and overly complicated.

Following this guide, developers can deploy application monitoring and log analysis on ClickHouse, leveraging real-time data streaming from AWS MySQL. By capturing transactional and event data from MySQL and streaming it to ClickHouse, teams can build a high-performance analytics pipeline capable of handling large-scale logs, performance metrics, and security events.

Guide Sections:

Prerequisites You will need Streamkap and Aws accounts
Setting up a New RDS MySQL From Scratch This step will help you to setup MySQL inside AWS
Configuring an Existing RDS MySQL For existing MySQL DB, you will have to modify its configuration to make it Streamkap-compatible
Creating a New Clickhouse Account This step will help you to set up a Clickhouse account.
Fetching credentials from existing Clickhouse Destination For the existing Clickhouse account, you will have to modify its permissions to allow Streamkap to write data
Streamkap Setup Adding MySQL as a source, adding Clickhouse as a destination, and finally connecting them using a data pipe

Ready to supercharge your data game? Let’s get started!

Prerequisites

To successfully follow this guide, ensure you have the items listed below.

1. Streamkap Account: Ensure you have an active Streamkap account with admin or data admin privileges. If you don't have one yet; signup here.

2. ClickHouse Account: You must have an active ClickHouse account with admin privileges. If you don’t have one yet, you can sign up here

3. Amazon AWS Account: To create, configure, or update an AWS RDS instance, you’ll need an active Amazon AWS account with networking and core RDS permissions. If you don’t have one yet, you can register here.

Source Set Up Start 

AWS RDS MySQL is one of the most widely used production databases globally, renowned for its extensive adoption and daily usage. Amazon has made it highly user-friendly, enabling new users to set up an RDS MySQL instance from scratch in just a few minutes. Additionally, if you already have an existing instance, adding the necessary configurations for Streamkap streaming is straightforward.

In this section, we will explore various methods to set up and configure AWS RDS MySQL to ensure seamless compatibility with Streamkap.

Setting up a New Source Database from Scratch

Note: To create a new AWS RDS MySQL instance, you will need IAM permissions for core RDS and networking activities such as creating and managing RDS instances, parameter groups, subnet groups, networking, security, monitoring, and optionally configuring encryption and tagging.

If your permissions are limited, reach out to your AWS administrator to grant you the necessary RDS and networking permissions.

Step 1:  Log in to the AWS Management Console
  • Log in to your AWS Management Console and type “RDS” on the search bar.
  • Click on “RDS” as shown below
AWS Management Console
Step 2. Create an RDS instance for the source database
  • Step 1 will take you to the RDS dashboard. From there, click on either "Databases" in the top left corner or "DB Instances" under the Resources section as depicted below.
Navigate to database
  • In the RDS Databases section, select the Create database option.

Note: To integrate with Streamkap, ensure that the selected region is one of the following:

Oregon (us-west-2)

North Virginia (us-east-1)

Europe Ireland (eu-west-1)

Asia Pacific - Sydney (ap-southeast-2)

Create databases
  • Select Standard create 
  • Next, select the MySQL engine under Engine options.
Create database
  • Scroll down and ensure the Engine version is below 8.4.
Select engine version
  • Under the Templates section choose between “Production”, “Dev/Test”, “Free tier” based on your need.
  • If you are deploying sub-second latency streaming for the first time, we recommend selecting the "Free tier" option. This allows you to gain practical experience with the setup process without incurring additional cloud costs, as demonstrated in the example below.
Template selection
  • Enter a DB instance identifier, for example source-database
  • Under Credentials Settings, select Self managed and enter a password for the database.
Database credentials
  • Enable Public access and opt to create a new VPC security group.
Database Connectivity settings
  • Under Additional configuration, enter the  Initial database name.
Additional configuration section
  • Click Create database to start the database creation process.
create the database
Databases list
Step 3. Create a Parameter group for Streamkap compatibility

By default, an AWS RDS MySQL instance is not compatible with Streamkap. To enable sub-second latency streaming, minor adjustments must be made to the database to make it compatible.

You are required to prepare your MySQL database on AWS RDS for Streamkap streaming by creating a custom parameter group. Parameter groups are configuration templates that define the behavior of your database instance.

To capture and stream data changes Streamkap needs binary logging and a specific binlog format to be enabled. By creating and applying a custom parameter group for Streamkap you ensure your RDS instance is configured to track real-time changes and make them available for streaming. This is the foundation step to have a reliable and efficient data pipeline to ClickHouse.

In this section of the guide, we will

  1. Create a new parameter group
  2. Modify gtid-mode = ON
  3. Modify enforce_gtid_consistency = ON
  4. Modify binlog_format = ROW
  5. Modify binlog_row_image = full
  6. Attach the new parameter group to our MySQL instance

  • On your RDS dashboard click on “Parameter groups” on the left side as shown in the following screenshot.
Navigation to Parameter groups
  • Open the Parameter groups section in AWS RDS and click Create parameter group.
Creating Parameter groups
  • Give your parameter group a name and a description. Select MySQL Community as the Engine type and mysql8.0 as the Parameter group family.
Configuration details
  • Click Create and verify that the parameter group appears in the list.
Parameter groups
  • Open the newly created parameter group and click Edit.
Edit parameter group
  • On the search bar
    • Type “gtid-mode” and add “ON” as value as shown below.
gtid-mode = ON
  • Type “enforce_gtid_consistency” and add “ON” as value as shown below.
enforce_gtid_consistency = ON
  • Type “binlog_format” and add “ROW” as value as shown below.
binlog_format = ROW
  • Type “binlog_row_image” and add “full” as value as shown below.
binlog_row_image = full
  • Click Save changes.
  • Back in your RDS instance, click Modify.
Modify RDS instance
  • Scroll down to Additional Configuration and select the new parameter group, then click Continue.
Apply modifications immediately
  • Choose to apply your changes immediately and click Modify DB instance.
Step 4: Test Your MySQL Instance for Streamkap Compatibility

It’s crucial to verify that changes made to the MySQL instance are properly reflected within the database. To do this, connect to your RDS MySQL instance using tools like DBeaver and run the following commands to ensure the functionality is working as expected.

```sql

SELECT @@global.gtid_mode, @@global.enforce_gtid_consistency, @@global.binlog_format, @@global.binlog_row_image;
Querying MySQL global variables

Note: If you encounter a timeout error while accessing your AWS MySQL instance from your local machine, it’s possible that the security group is not allowing inbound traffic from your local machine. To resolve this, you’ll need to modify the inbound rules of your security group to permit access from your local machine to your cloud MySQL instance.

Configuring an Existing Source Database for Streamkap Compatibility

If you already have an existing AWS RDS MySQL instance, it may or may not be compatible with Streamkap. To determine if it is compatible, we will perform a test. The test must pass to confirm that the instance is compatible.

To perform this test, connect to your RDS MySQL instance using tools like DBeaver and run the following command to ensure functionality is working as expected.

```sql

SELECT @@global.gtid_mode, @@global.enforce_gtid_consistency, @@global.binlog_format, @@global.binlog_row_image;
Querying MySQL global variables

The test must return the expected values (gtid_mode = ON, enforce_gtid_consistency = ON, binlog_format = ROW, and binlog_row_image = FULL) for the RDS instance to be Streamkap compatible. If the test fails, adjustments to the RDS parameter group may be required.

If the test does not return the expected values, you will need to modify the existing custom RDS parameter group or create a new parameter group with the necessary changes to enable Streamkap compatibility. Specifically, you'll need to configure the following parameters.

If you already have an existing custom RDS parameter group skip to step 2.

Step 1: Create a Customer RDS Parameter Group
  • On your RDS dashboard click on “Parameter groups” on the left side and then click on “Create parameter groups” as shown in the following screenshot
Parameter groups
  • Give your parameter group a name and optionally a description. Select MySQL Community as the Engine type and mysql8.0 as the Parameter group family.
Configuration details
  • Click Create and verify that the parameter group appears in the list.
Parameter groups
Step 2: Configure the RDS Parameter Group
  • Open the parameter group and click Edit.
Edit parameter group
  • On the search bar
    • Type “gtid-mode” and add “ON” as value as shown below.
gtid-mode = ON
  • Type “enforce_gtid_consistency” and add “ON” as value as shown below.
enforce_gtid_consistency = ON
  • Type “binlog_format” and add “ROW” as value as shown below.
binlog_format = ROW
  • Type “binlog_row_image” and add “full” as the value as shown below.
”binlog_row_image = full
  • Click Save Changes.
  • Go back to your database instance and click Modify.
Source database
  • Scroll down to Additional Configuration and change the DB parameter group to your new group.
Modifying an instance
  • Scroll down and click Continue.
Modify DB instance
  • Click Apply immediately and then Modify DB instance.
Summary display

With these changes, Streamkap will be able to receive data streams from MySQL and you can use the database as a source.

Destination Warehouse Set Up Start

ClickHouse is a fast and scalable columnar database management system that is widely used for real-time data analytics. It allows users to efficiently store and query large volumes of data, making it ideal for applications that require high-performance analytics. ClickHouse is known for its ability to handle complex queries with low latency, making it a popular choice for many cloud applications.

Creating a New Destination account 

Now that the source database is set up and is streaming data, the next step is to configure the destination data warehouse which is ClickHouse. ClickHouse is perfect for storing and analyzing big volumes of streaming data. 

As the endpoint for your Streamkap pipeline, ClickHouse will get real-time data updates so you can do advanced analytics and get valuable insights. In this section, we’ll walk you through the process of setting up your ClickHouse instance to work with MySQL on AWS RDS through Streamkap and handle incoming data streams.

  • After signing up for ClickHouse and providing your personal details, enter the data warehouse details and select AWS as your provider. Select either a Production or Development deployment and click Create service.
Create service in ClickHouse
  • On the navigation menu, select Connect to obtain your connection details.
ClickHouse navigation
  • The dialog presents all the details you need to set up your destination connection in Streamkap.
Connection settings

To create a database and a dedicated Streamkap user, open the SQL Console and execute queries like the following.

```sql

-- Create database 
CREATE database streamkap_datawarehouse;

-- Create Streamkap role and user 
CREATE ROLE STREAMKAP_ROLE; 
GRANT SELECT, CREATE, SHOW, INSERT, ALTER ON streamkap_datawarehouse.* TO STREAMKAP_ROLE; 
CREATE USER streamkap_user IDENTIFIED BY '{password}'; 
GRANT STREAMKAP_ROLE TO streamkap_user; 

With both a source database and a destination data warehouse, you now have both parts needed to set up a data stream.

Fetching credentials from existing Destination warehouse

If you already have a ClickHouse instance set up, fetch the credentials to connect to it with Streamkap. These credentials are host, port, username, and password. Without these credentials, the connection to your ClickHouse data warehouse will not be possible. 

  • Getting the correct credentials is a crucial step in the integration process as Streamkap will stream real-time data updates directly into your ClickHouse instance. 
  • Select Connect from the ClickHouse navigation menu.
ClickHouse navigation
  • The credentials and connection details will be visible in the dialog.
ClickHouse connection details

The above credentials will be used to set up the destination connector in Streamkap. You only need the hostname part of the URL as https:// is implied and the port is specified separately.

Streamkap Set Up

To connect Streamkap to AWS MySQL, we need to ensure the database is configured to accept traffic from Streamkap by safelisting Streamkap’s IP addresses.

Note: If MySQL accepts traffic from anywhere in the world, you can move on to the “Configuring  MySQL for Streamkap Integration” section.

Safelisting Streamkap’s IP Address

Streamkap’s dedicated IP addresses are given below:

Service IPs
Oregon (us-west-2)
52.32.238.100
North Virginia (us-east-1) 44.214.80.49
Europe Ireland (eu-west-1)
34.242.118.75
Asia Pacific - Sydney (ap-southeast-2) 52.62.60.121

When signing up, Oregon (us-west-2) is the default region. Let us know if you need it to be elsewhere. For more information about our IP addresses please visit this link.

  • Start by locating the VPC security group assigned to our RDS instance and open it up.
Connectivity & security tab
  • Click Edit inbound rules.
Inbound rules
  • In order to safelist one of our IPs, open your RDS MySQL instance’s security group with type = “CIDR/IP - Inbound” and click “Edit inbound rules”.
  • Click on “Add rule”, choose “MYSQL/Aurora” as type, “Custom” as source, and paste any of our relevant IP addresses followed by “/32”. For instance, if you want to add IP address for Oregon in the custom box it would look like “52.32.238.100/32” as shown in the below image.
  • Also add your own IP address to allow access from DBeaver.
  • Finally, click Save rules.
Editing inbound rules

Configuring Source Database for Streamkap Integration

Streamkap recommends creating a dedicated user and role within your MySQL instance to facilitate secure and controlled data streaming. This approach ensures granular access management and follows best practices for database security.

Access your MySQL database via a tool like Dbeaver and run the following code

Step 1: Create a dedicated user and schema for Streamkap

```sql

CREATE USER 'streamkap_user'@'%' IDENTIFIED BY '{password}'; 
CREATE SCHEMA IF NOT EXISTS streamkap;

Note: You must plug in your own password

Step 2: Grant permissions to the user

```sql

GRANT REPLICATION CLIENT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, SELECT ON *.* TO 'streamkap_user'@'%';
Step 3: Enabling Streamkap heartbeat schema and table

Streamkap will use the following schema and table to manage heartbeats.

```sql

-- Switch to the streamkap schema 
USE streamkap; 

-- Create the heartbeat table with id, text, and last_update fields 
CREATE TABLE streamkap_heartbeat ( 
    id INT AUTO_INCREMENT PRIMARY KEY, 
    text TEXT, 
    last_update TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP 
); 

-- Grant permission to the Streamkap user 
GRANT SELECT, INSERT, UPDATE ON streamkap.streamkap_heartbeat TO 'streamkap_user'; 

-- Insert the first row into the heartbeat table 
INSERT INTO streamkap_heartbeat (text) VALUES ('test_heartbeat'); 
Step 4. Creating a data table for streaming

Let’s create a sample table from which Streamkap will stream data. If you already have an existing table(s), you would like to stream from, you can skip this step.

```sql

CREATE TABLE `orders` ( 
    `order_id` INT AUTO_INCREMENT PRIMARY KEY, 
    `customer_id` INT NOT NULL, 
    `order_date` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, 
    `status` ENUM('Pending', 'Processing', 'Shipped', 'Delivered', 'Cancelled') DEFAULT 'Pending', 
    `total_amount` DECIMAL(10, 2) NOT NULL, 
    `payment_method` ENUM('Credit Card', 'PayPal', 'Bank Transfer', 'Cash') DEFAULT 'Credit Card', 
    `shipping_address` TEXT NOT NULL, 
    `created_at` DATETIME DEFAULT CURRENT_TIMESTAMP, 
    `updated_at` DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP 
);

You now have a table that can be streamed to ClickHouse. We will use this table to set up a data stream to ClickHouse.

Adding RDS MySQL Database as a Source Connector

Now that you have your source database and destination data warehouse set up, it’s time to set up Streamkap, the piece that connects the two together and allows for real-time data streaming. We start by adding a source database connector in Streamkap to capture changes from your AWS MySQL instance. 

The source connector allows Streamkap to monitor your database, detect changes, and send them to your ClickHouse warehouse. By setting this up correctly you’re creating a secure and reliable link between your source database and Streamkap so changes are captured accurately and streamed in real time. 

In this section, we’ll go through the steps to create and configure the source connector, so you can get your data pipeline up and running smoothly.

Visit the Streamkap sign-in link and log in. You will land on the dashboard page.

Note: You need admin or data admin privileges to proceed with the subsequent steps.

  • Open the Connectors section in Streamkap and Click Add under the default tab of Sources.
adding source
  • Enter your MySQL connection details. The Hostname field in Streamkap is the same as the Endpoint field in RDS.
  • Ensure the heartbeat is properly configured by turning it on and plugging in the heartbeat table name.
  • Click Next.
Streamkap connection details
  • Select the databases and tables you wish to stream to ClickHouse. Click Save.
Selecting database and tables
  • You should see your new source connector in the list.
Source connection status

Adding a Destination Connector

After you’ve set up the source database connector, you need to set up the destination connector in Streamkap. This connector connects Streamkap to your ClickHouse instance and writes data from your MySQL database to the correct tables in ClickHouse. 

The destination connector is a key part of the puzzle that allows Streamkap to relay changes to your data warehouse in real time and maintains data integrity. While this sets up the connection to the destination, the full pipeline is still not live yet. 

In this section, we’ll walk you through setting up the destination connector and bring you one step closer to your real-time data pipeline.

  • In the Connectors section of Streamkap, click on the Destinations tab.
Streamkap destinations
  • Click the Add button.
Destinations tab
  • Select Clickhouse from the list of available connectors.
Destination connector
  • Enter your ClickHouse connection details and click Save.
Streamkap destination settings

Adding a Pipeline

Now that we have the source and destination connectors in place, the final step is to create the pipeline in Streamkap. The pipeline is the backbone of your data streaming setup, connecting the source database to the destination warehouse and controlling the flow of data between them. 

By configuring the pipeline, you define how the data is captured, transformed if needed, and delivered in real-time to ClickHouse. 

This is the final step of the setup and then your pipeline will be ready for production workloads. 

In this section, we will walk you through creating and activating your Streamkap pipeline and bring your data streaming workflow to life.

  • Start by navigating to Pipelines from the menu in Streamkap. Click Create.
Empty list of pipelines
  • Select your source and destination database connectors.
Selecting source and destination
  • Select the relevant databases and tables to be used for the pipeline. Click Save.
Selecting source databases and tables
  • Confirm that your pipeline is Active.
Newly created pipeline

Populate your “orders” table at source and you will see the values reflected in Clickhouse in near real-time as depicted below.

By design, Your “Table” in the destination will have the following meta columns apart from the regular columns.

Column Name Description
_STREAMKAP_SOURCE_TS_MS Timestamp in milliseconds for the event within the source log
_STREAMKAP_TS_MS Timestamp in milliseconds for the event once Streamkap receives it
__DELETED Indicates whether this event/row has been deleted at source
_STREAMKAP_OFFSET This is an offset value in relation to the logs we process. It can be useful in the debugging of any issues that may arise

What's Next? 

Thank you for reading this guide. If you have other sources and destinations to connect in near real-time check out the following guides. 

For more information on connectors please visit here