Manage Topics
Topics provide a way to organize events in a data streaming platform. When you create a topic, the default topic settings from the cluster configuration file are applied, unless you specify a different setting on the command line.
The following table shows the topic property settings listed in the cluster configuration file along with their default values, and the equivalent topic property names:
Cluster property | Default | Topic property |
---|---|---|
|
delete |
|
|
null (no limit) |
|
|
604800000 ms (1 week) |
|
|
null (no limit) |
|
|
134217728 bytes (128 MiB) |
|
|
producer |
|
|
CreateTime |
|
|
1048576 bytes (1 MiB) |
|
For details about topic properties, refer to Topic Configuration Properties. |
These default settings are best suited to a one-broker cluster in a development environment. To learn how to modify the default values in the configuration file, see Configure Cluster Properties. Even if you set default values that work for most topics, you may still want to change some properties for a specific topic.
Create a topic
Creating a topic can be as simple as specifying a name for your topic on the command line. For example, to create a topic named xyz
:
rpk topic create xyz
This command creates a topic named xyz
with one partition and one replica, since these are the default values set in the cluster configuration file.
But suppose you want to create a topic with different values for these settings. The guidelines in this section show you how to choose the number of partitions and replicas for your use case.
Choose the number of partitions
A partition acts as a log file where topic data is written. Dividing topics into partitions allows producers to write messages in parallel and consumers to read messages in parallel. The higher the number of partitions, the greater the throughput.
As a general rule, select a number of partitions that corresponds to the maximum number of consumers in any consumer group that will consume the data. |
For example, suppose you plan to create a consumer group with 10 consumers. To create topic xyz
with 10 partitions:
rpk topic create xyz -p 10
Choose the replication factor
Replicas are copies of partitions that are distributed across different brokers, so if one broker goes down, other brokers still have a copy of the data. The default replication factor in the cluster configuration is set to 1.
By choosing a replication factor greater than 1, you ensure that each partition has a copy of its data on at least one other broker. One replica acts as the leader, and the other replicas are followers.
To specify a replication factor of 3 for topic “xyz”:
rpk topic create xyz -r 3
The replication factor must be an odd number. Redpanda Data recommends a replication factor of 3 for most use cases. |
Update topic configurations
After you create a topic, you can update the topic property settings for all new data written to it. For example, you can add partitions, change the replication factor, or change a configuration setting like the cleanup policy.
Add partitions
You can assign a certain number of partitions when you create a topic, and add partitions later. For example, suppose you add brokers to your cluster, and you want to take advantage of the additional processing power. To increase the number of partitions for existing topics, run:
rpk topic add-partitions [TOPICS...] --num [#]
Note that --num <#>
is the number of partitions to add, not the total number of partitions.
Change the replication factor
Suppose you create a topic with the default replication factor of 1 (which is specified in the cluster properties configuration file). Now you want to change the replication factor to 3, so you can have two backups of topic data in case a broker goes down. To set the replication factor to 3 for all new data written to the topic, run:
rpk topic alter-config [TOPICS...] --set replication.factor=3
The replication factor can’t exceed the number of Redpanda brokers. If you try to set a replication factor greater than the number of brokers, the request is rejected. |
Change the cleanup policy
The cleanup policy determines how to clean up the partition log files when they reach a certain size:
-
delete
deletes data based on age or log size. -
compact
compacts the data by only keeping the latest values for each KEY. -
compact,delete
combines both methods.
Suppose you configure a topic using the default cleanup policy delete
, and you want to change the policy to compact
. Run the rpk topic alter-config
command as shown:
rpk topic alter-config [TOPICS…] —-set cleanup.policy=compact
Remove a configuration setting
You can remove a configuration that overrides the default setting, and the setting will use the default value again. For example, suppose you altered the cleanup policy to use compact
instead of the default, delete
. Now you want to return the policy setting to the default. To remove the configuration setting cleanup.policy=compact
, run rpk topic alter-config
with the --delete
flag as shown:
rpk topic alter-config [TOPICS...] --delete cleanup.policy
List topic configuration settings
To display all the configuration settings for a topic, run:
rpk topic describe <topic-name> -c
The -c
flag limits the command output to just the topic configurations. This command is useful for checking the default configuration settings before you make any changes, and for verifying changes after you make them.
The following command output displays after running rpk topic describe test-topic
, where test-topic
was created with default settings:
rpk topic describe test_topic
SUMMARY
=======
NAME test_topic
PARTITIONS 1
REPLICAS 1
CONFIGS
=======
KEY VALUE SOURCE
cleanup.policy delete DYNAMIC_TOPIC_CONFIG
compression.type producer DEFAULT_CONFIG
max.message.bytes 1048576 DEFAULT_CONFIG
message.timestamp.type CreateTime DEFAULT_CONFIG
redpanda.datapolicy function_name: script_name: DEFAULT_CONFIG
redpanda.remote.delete true DEFAULT_CONFIG
redpanda.remote.read false DEFAULT_CONFIG
redpanda.remote.write false DEFAULT_CONFIG
retention.bytes -1 DEFAULT_CONFIG
retention.local.target.bytes -1 DEFAULT_CONFIG
retention.local.target.ms 86400000 DEFAULT_CONFIG
retention.ms 604800000 DEFAULT_CONFIG
segment.bytes 1073741824 DEFAULT_CONFIG
Now suppose you add two partitions, and increase the number of replicas to 3. The new command output confirms the changes in the SUMMARY
section:
SUMMARY ======= NAME test_topic PARTITIONS 3 REPLICAS 3
Delete a topic
To delete a topic, run:
rpk topic delete <topic-name>
When a topic is deleted, its underlying data is deleted, too.
To delete multiple topics at a time, provide a space-separated list. For example, to delete two topics named topic1
and topic2
, run:
rpk topic delete topic1 topic2
You can also use the -r
flag to specify one or more regular expressions; then, any topic names that match the pattern you specify are deleted. For example, to delete topics with names that start with “f” and end with “r”, run:
rpk topic delete -r '^f.*' '.*r$'
Note that the first regular expression must start with the ^
symbol, and the last expression must end with the $
symbol. This requirement helps prevent accidental deletions.
Delete records from a topic
Redpanda lets you delete data from the beginning of a partition up to a specific offset (event). The offset represents the true creation time of the event, not the time when it was stored by Redpanda. Deleting records frees up disk space, which is especially helpful if your producers are pushing more data than you anticipated in your retention plan. Do this when you know that all consumers have read up to that given offset, and the data is no longer needed.
There are different ways to delete records from a topic, including using the rpk topic trim
command or using the DeleteRecords Kafka API with Kafka clients.
|