Set Up Postgres CDC with Debezium and Redpanda

This example demonstrates using Debezium to capture the changes made to Postgres in real time and stream them to Redpanda.

This ready-to-run docker-compose setup contains the following containers:

  • postgres container with the pandashop database, containing a single table, orders

  • debezium container capturing changes made to the orders table in real time.

  • redpanda container to ingest change data streams produced by debezium

For more information about pandashop schema, see the /data/postgres_bootstrap.sql file.

Example architecture

Prerequisites

You must have Docker and Docker Compose installed on your host machine.

Run the lab

  1. Download the following Docker Compose file on your local file system.

    Reveal the Docker Compose file
    docker-compose.yml
    version: '3.7'
    name: redpanda-cdc-postgres
    volumes:
      redpanda: null
    services:
      postgres:
        image: debezium/postgres:16
        container_name: postgres
        ports:
          - 5432:5432
        healthcheck:
          test: "pg_isready -U postgresuser -d shipment_db"
          interval: 2s
          timeout: 20s
          retries: 10
        environment:
          - POSTGRES_USER=postgresuser
          - POSTGRES_PASSWORD=postgrespw
          - POSTGRES_DB=pandashop
          - PGPASSWORD=postgrespw
        volumes:
          - ./data:/docker-entrypoint-initdb.d
      redpanda:
        image: docker.redpanda.com/redpandadata/redpanda:v23.3.12
        container_name: redpanda
        command:
          - redpanda start
          # Mode dev-container uses well-known configuration properties for development in containers.
          - --mode dev-container
          # Tells Seastar (the framework Redpanda uses under the hood) to use 1 core on the system.
          - --smp 1
          - --kafka-addr internal://0.0.0.0:9092,external://0.0.0.0:19092
          # Address the broker advertises to clients that connect to the Kafka API.
          # Use the internal addresses to connect to the Redpanda brokers
          # from inside the same Docker network.
          # Use the external addresses to connect to the Redpanda brokers
          # from outside the Docker network.
          - --advertise-kafka-addr internal://redpanda:9092,external://localhost:19092
          - --pandaproxy-addr internal://0.0.0.0:8082,external://0.0.0.0:18082
          # Address the broker advertises to clients that connect to the HTTP Proxy.
          - --advertise-pandaproxy-addr internal://redpanda:8082,external://localhost:18082
          - --schema-registry-addr internal://0.0.0.0:8081,external://0.0.0.0:18081
          # Redpanda brokers use the RPC API to communicate with each other internally.
          - --rpc-addr redpanda:33145
          - --advertise-rpc-addr redpanda:33145
        ports:
          - 18081:18081
          - 18082:18082
          - 19092:19092
          - 19644:9644
        volumes:
          - redpanda:/var/lib/redpanda/data
        healthcheck:
          test: ["CMD-SHELL", "rpk cluster health | grep -E 'Healthy:.+true' || exit 1"]
          interval: 15s
          timeout: 3s
          retries: 5
          start_period: 5s
      debezium:
        image: debezium/connect:2.4
        container_name: debezium
        environment:
          BOOTSTRAP_SERVERS: redpanda:9092
          GROUP_ID: 1
          CONFIG_STORAGE_TOPIC: connect_configs
          OFFSET_STORAGE_TOPIC: connect_offsets
        depends_on: [postgres, redpanda]
        ports:
          - 8083:8083
  2. Set the REDPANDA_VERSION environment variable to the version of Redpanda that you want to run. For all available versions, see the GitHub releases.

    For example:

    export REDPANDA_VERSION=23.3.12
  3. Run the following in the directory where you saved the Docker Compose file:

    docker compose up -d

    When the postgres container starts, the /data/postgres_bootstrap.sql file creates the pandashop database and the orders table, followed by seeding the ` orders` table with a few records.

  4. Log into Postgres:

    docker compose exec postgres psql -U postgresuser -d pandashop
  5. Check the content inside the orders table:

    select * from orders;

    This is the source table.

  6. While Debezium is up and running, create a source connector configuration to extract change data feeds from Postgres:

    docker compose exec debezium curl -H 'Content-Type: application/json' debezium:8083/connectors --data '
    {
      "name": "postgres-connector",
      "config": {
        "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
        "plugin.name": "pgoutput",
        "database.hostname": "postgres",
        "database.port": "5432",
        "database.user": "postgresuser",
        "database.password": "postgrespw",
        "database.dbname" : "pandashop",
        "database.server.name": "postgres",
        "table.include.list": "public.orders",
        "topic.prefix" : "dbz"
      }
    }'

    Notice the database.* configurations specifying the connectivity details to postgres container. Wait a minute or two until the connector gets deployed inside Debezium and creates the initial snapshot of change log topics in Redpanda.

  7. Check the list of change log topics in Redpanda:

    docker compose exec redpanda rpk topic list

    The output should contain two topics with the prefix dbz.* specified in the connector configuration. The topic dbz.public.orders holds the initial snapshot of change log events streamed from orders table.

    NAME               PARTITIONS  REPLICAS
    connect-status     5           1
    connect_configs    1           1
    connect_offsets    25          1
    dbz.public.orders  1           1
  8. Monitor for change events by consuming the dbz.public.orders topic:

    docker compose exec redpanda rpk topic consume dbz.public.orders
  9. While the consumer is running, open another terminal to insert a record to the orders table:

    export REDPANDA_VERSION=23.3.12
    docker compose exec postgres psql -U postgresuser -d pandashop
  10. Insert the following record:

    INSERT INTO orders (customer_id, total) values (5, 500);

This will trigger a change event in Debezium, immediately publishing it to dbz.public.orders Redpanda topic, causing the consumer to display a new event in the console. That proves the end to end functionality of your CDC pipeline.

Clean up

To shut down and delete the containers along with all your cluster data:

docker compose down -v

Next steps

Now that you have change log events ingested into Redpanda. You process change log events to enable use cases such as:

  • Database replication

  • Stream processing applications

  • Streaming ETL pipelines

  • Update caches

  • Event-driven Microservices