One of the more elegant aspects of Redpanda is that it is compatible with the Kafka API and ecosystem. So when you want to migrate data from Kafka or replicate data between Redpanda clusters, the MirrorMaker 2 (MM2) tool bundled in the Kafka download package is a natural solution.
This section describes the process of configuring MirrorMaker 2 to replicate data from one Redpanda cluster to another.
MirrorMaker 2 reads a configuration file that specifies the connection details for the Redpanda clusters, as well as other options. After you start MirrorMaker 2, the data migration continues according to the configuration at launch time until you shutdown the MirrorMaker 2 process.
For details on MirrorMaker 2 and its options, see the Kafka documentation.
To set up the replication, we'll need:
MirrorMaker 2 host - You install MirrorMaker 2 on a separate system or on one of the Redpanda clusters, as long as the IP addresses and ports on each cluster are accessible from the MirrorMaker 2 host. You must install the Java Runtime Engine (JRE) on the MirrorMaker 2 host.
Install MirrorMaker 2
MirrorMaker 2 is run by a shell script that is part of the Kafka download package. To install MirrorMaker 2 on the machine that you want to run the replication between the clusters:
Download the latest Kafka download package.
curl -O https://dlcdn.apache.org/kafka/3.0.0/kafka_2.13-3.0.0.tgz
Extract the files from the archive:
tar -xvf kafka_2.13-3.0.0.tgz
Create the MirrorMaker 2 config files
MirrorMaker 2 uses configuration files to get the connection details for the clusters. You can find the MirrorMaker 2 script and configuration files in the expanded Kafka directory.
The sample configuration describes a number of the settings for MirrorMaker 2.
To create a basic configuration file, go to the
config and run this command:
cat << EOF > mm2.properties
// Name the clusters
clusters = <cluster_name_1>, <cluster_name_2>
// Assign IP addresses to the cluster names
<cluster_name_1>.bootstrap.servers = <cluster_1>_cluster_ip>:9092
<cluster_name_2>.bootstrap.servers = <cluster_2>_cluster_ip>:9092
// Set replication for all topics from Redpanda 1 to Redpanda 2
<cluster_name_1>-><cluster_name_2>.enabled = true
<cluster_name_1>-><cluster_name_2>.topics = .*
// Setting replication factor of newly created remote topics
replication.factor = 1
//Make sure that your target cluster can accept larger message sizes. For example, 30MB messages
Remember to edit the variable names on the examples for your environment. For example,
<cluster_name_1> can become
Run MirrorMaker 2 to start replication
To start MirrorMaker 2 in the
kafka_2.13-3.0.0/bin/ directory, run:
With this command, MirrorMaker 2 consumes all topics from the <cluster_name_1> cluster and replicates them into the <cluster_name_2> cluster.
MirrorMaker 2 adds the prefix
<cluster_name_1> to the names of replicated topics.
See migration in action
Here are the basic commands to produce and consume streams:
Create a topic in the source cluster. We'll call it "twitch_chat":
rpk topic create twitch_chat --brokers <node IP>:<kafka API port>
Produce messages to the topic:
rpk topic produce twitch_chat --brokers <node IP>:<kafka API port>
Type text into the topic and press Ctrl + D to separate between messages.
Press Ctrl + C to exit the produce command.
Consume (or read) the messages from the destination cluster:
rpk topic consume twitch_chat --brokers <node IP>:<kafka API port>
Each message is shown with its metadata, like this:
"message": "How do you stream with Redpanda?\n",
Now you know the replication is working.
Stop MirrorMaker 2
To stop the MirrorMaker 2 process, use
top to find its process ID, and then run:
kill <MirrorMaker 2 pid>
Apache Kafka® limits you to 1MB max message size. Redpanda doesn't impose such limits.
However, when replicating those larger message sizes with MirrorMaker 2 on the target cluster, you can get blocked with an error
org.apache.kafka.common.errors.RecordTooLargeException: The message is xxxx bytes when serialized which is larger than 1048576, which is the value of the max.request.size configuration.
To address this issue, make sure that your
mm2.properties configuration file on the target cluster allows bigger messages sizes. For example, for 30MB messages, you'd have the following line in the configuration file:
If you run into any difficulty with data migration, contact us in our Slack community.