Set Up MySQL CDC with Debezium and Redpanda
This example demonstrates how to use Debezium to capture the changes made to MySQL in real time and stream them to Redpanda.
This ready-to-run Docker Compose setup contains the following containers:
-
mysql
container with thepandashop
database, containing a single table,orders
-
debezium
container capturing changes made to theorders
table in real time. -
redpanda
container to ingest change data streams produced bydebezium
For more information about the pandashop
database schema, see the /data/mysql_bootstrap.sql
file.
Prerequisites
You must have Docker and Docker Compose installed on your host machine.
Run the lab
-
Download the following Docker Compose file on your local file system.
Reveal the Docker Compose file
docker-compose.yml
version: '3.7' name: redpanda-cdc-mysql volumes: redpanda: null services: mysql: image: mysql/mysql-server:8.0.27 hostname: mysql container_name: mysql ports: - 3306:3306 environment: - MYSQL_ROOT_PASSWORD=debezium - MYSQL_USER=mysqluser - MYSQL_PASSWORD=mysqlpw volumes: - ./data/mysql.cnf:/etc/mysql/conf.d - ./data/mysql_bootstrap.sql:/docker-entrypoint-initdb.d/mysql_bootstrap.sql redpanda: image: docker.redpanda.com/redpandadata/redpanda:v23.3.13 container_name: redpanda command: - redpanda start # Mode dev-container uses well-known configuration properties for development in containers. - --mode dev-container # Tells Seastar (the framework Redpanda uses under the hood) to use 1 core on the system. - --smp 1 - --kafka-addr internal://0.0.0.0:9092,external://0.0.0.0:19092 # Address the broker advertises to clients that connect to the Kafka API. # Use the internal addresses to connect to the Redpanda brokers # from inside the same Docker network. # Use the external addresses to connect to the Redpanda brokers # from outside the Docker network. - --advertise-kafka-addr internal://redpanda:9092,external://localhost:19092 - --pandaproxy-addr internal://0.0.0.0:8082,external://0.0.0.0:18082 # Address the broker advertises to clients that connect to the HTTP Proxy. - --advertise-pandaproxy-addr internal://redpanda:8082,external://localhost:18082 - --schema-registry-addr internal://0.0.0.0:8081,external://0.0.0.0:18081 # Redpanda brokers use the RPC API to communicate with each other internally. - --rpc-addr redpanda:33145 - --advertise-rpc-addr redpanda:33145 ports: - 18081:18081 - 18082:18082 - 19092:19092 - 19644:9644 volumes: - redpanda:/var/lib/redpanda/data healthcheck: test: ["CMD-SHELL", "rpk cluster health | grep -E 'Healthy:.+true' || exit 1"] interval: 15s timeout: 3s retries: 5 start_period: 5s debezium: image: debezium/connect:2.4 container_name: debezium environment: BOOTSTRAP_SERVERS: redpanda:9092 GROUP_ID: 1 CONFIG_STORAGE_TOPIC: connect_configs OFFSET_STORAGE_TOPIC: connect_offsets depends_on: [mysql, redpanda] ports: - 8083:8083
-
Set the
REDPANDA_VERSION
environment variable to the version of Redpanda that you want to run. For all available versions, see the GitHub releases.For example:
export REDPANDA_VERSION=23.3.13
-
Run the following in the directory where you saved the Docker Compose file:
docker compose up -d
When the
mysql
container starts, the/data/mysql_bootstrap.sql
file creates thepandashop
database and theorders
table, followed by seeding the ` orders` table with a few records. -
Log into MySQL:
docker compose exec mysql mysql -u mysqluser -p
Provide
mysqlpw
as the password when prompted. -
Check the content inside the
orders
table:use pandashop; show tables; select * from orders;
This is your source table.
-
Exit MySQL:
exit
-
While Debezium is up and running, create a source connector configuration to extract change data feeds from MySQL.
docker compose exec debezium curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d ' { "name": "mysql-connector", "config": { "connector.class": "io.debezium.connector.mysql.MySqlConnector", "tasks.max": "1", "database.hostname": "mysql", "database.port": "3306", "database.user": "debezium", "database.password": "dbz", "database.server.id": "184054", "topic.prefix": "dbz", "database.include.list": "pandashop", "schema.history.internal.kafka.bootstrap.servers": "redpanda:9092", "schema.history.internal.kafka.topic": "schemahistory.pandashop" } }'
You should see the following in the output:
HTTP/1.1 201 Created Date: Mon, 12 Feb 2024 16:37:09 GMT Location: http://localhost:8083/connectors/mysql-connector Content-Type: application/json Content-Length: 489 Server: Jetty(9.4.51.v20230217)
The
database.*
configurations specify the connectivity details tomysql
container. The parameter,schema.history.internal.kafka.bootstrap.servers
points to theredpanda
broker the connector uses to write and recover DDL statements to the database schema history topic. -
Wait a minute or two until the connector gets deployed inside Debezium and creates the initial snapshot of change log topics in Redpanda.
-
Check the list of change log topics in
redpanda
by running:docker compose exec redpanda rpk topic list
The output should contain two topics with the prefix
dbz.*
specified in the connector configuration. The topicdbz.pandashop.orders
holds the initial snapshot of change log events streamed fromorders
table.NAME PARTITIONS REPLICAS connect-status 5 1 connect_configs 1 1 connect_offsets 25 1 dbz 1 1 dbz.pandashop.orders 1 1 schemahistory.pandashop 1 1
-
Monitor for change events by consuming the
dbz.pandashop.orders
topic:docker compose exec redpanda rpk topic consume dbz.pandashop.orders
-
While the consumer is running, open another terminal to insert a record to the
orders
table.export REDPANDA_VERSION=23.3.13 docker compose exec mysql mysql -u mysqluser -p
Provide
mysqlpw
as the password when prompted. -
Insert the following record:
use pandashop; INSERT INTO orders (customer_id, total) values (5, 500);
This will trigger a change event in Debezium, immediately publishing it to dbz.pandashop.orders
Redpanda topic, causing the consumer to display a new event in the console. That proves the end-to-end functionality of your CDC pipeline.
Clean up
To shut down and delete the containers along with all your cluster data:
docker compose down -v
Next steps
Now that you have change log events ingested into Redpanda. You process change log events to enable use cases such as:
-
Database replication
-
Stream processing applications
-
Streaming ETL pipelines
-
Update caches
-
Event-driven Microservices