Audit Logging

Many scenarios for streaming data include the need for fine-grained auditing of user activity related to the system. This is especially true for regulated industries such as finance, healthcare, and the public sector. Complying with PCI DSS v4 standards, for example, requires verbose and detailed activity auditing, alerting, and analysis capabilities.

Redpanda’s auditing capabilities support recording both administrative and operational interactions with topics and with users. Redpanda complies with the Open Cybersecurity Schema Framework (OCSF), providing a predictable and extensible solution that works seamlessly with industry standard tools.

With audit logging enabled, there should be no noticeable changes in performance other than slightly elevated CPU usage.

Audit logging is configured at the cluster level. Redpanda supports excluding specific topics or principals from auditing to help reduce noise in the log. Audit logging is disabled by default.

Audit log flow

The Redpanda audit log mechanism functions similar to the Kafka flow you may be familiar with. When a user interacts with another user or with a topics, Redpanda writes an event to a specialized audit topic. The audit topic is immutable. Only Redpanda can write to it. Users are prevented from writing to the audit topic directly and the Kafka API cannot create or delete it.

Audit log flow

By default, any management and authentication actions performed on the cluster yield messages written to the audit log topic that are retained for seven days. Interactions with all topics by all principals are audited. Actions performed using the Kafka API and Admin API are all audited, as are actions performed directly through rpk.

Messages recorded to the audit log topic comply with the open cybersecurity schema framework. Any number of analytics frameworks, such as Splunk or Sumo Logic, can receive and process these messages. Using an open standard ensures Redpanda’s audit logs coexist with those produced by other IT assets, powering holistic monitoring and analysis of your assets.

Audit log configuration options

Redpanda’s audit logging mechanism supports several options to control the volume and availability of audit records. Configuration is applied at the cluster level using the standard cluster configuration mechanism.

  • audit_enabled: Boolean value to enable audit logging. When you set this to true, Redpanda checks for an existing topic named _redpanda.audit_log. If none is found, Redpanda automatically creates one for you. Default: false.

  • audit_log_num_partitions: Integer value defining the number of partitions used by a newly created audit topic. This configuration applies only to the audit log topic and may be different from the cluster or other topic configurations. This cannot be altered for an existing audit log topic. Default: 12.

  • audit_log_replication_factor: Optional Integer value defining the replication factor for a newly created audit log topic. This configuration applies only to the audit log topic and may be different from the cluster or other topic configurations. This cannot be altered for existing audit log topics. If a value is not provided, Redpanda will use the internal_topic_replication_factor cluster config value. Default: null.

  • audit_client_max_buffer_size: Integer value defining the number of bytes allocated by the internal audit client for audit messages. When changing this, you must disable audit logging and then re-enable it for the change to take effect. Consider increasing this if your system generates a very large number of audit records in a short amount of time. Default: 16777216.

  • audit_queue_max_buffer_size_per_shard: Integer value defining the maximum amount of memory in bytes used by the audit buffer in each shard. Once this size is reached, requests to log additional audit messages will return a non-retryable error. You must restart the cluster when changing this value. Default: 1048576.

  • audit_enabled_event_types: List of strings in JSON style identifying the event types to include in the audit log. This may include any of the following - management, produce, consume, describe, heartbeat, authenticate, schema_registry, admin. Default: '["management","authenticate","admin"]'.

  • audit_exclude_topics: List of strings in JSON style identifying the topics the audit logging system should ignore. This list cannot include the _redpanda.audit_log topic. Redpanda will reject the command if you do attempt to include that topic. Default: null.

  • audit_queue_drain_interval_ms: Internally, Redpanda batches audit log messages in memory and periodically writes them to the audit log topic. This defines the period in milliseconds between draining this queue to the audit log topic. Longer intervals may help prevent duplicate messages, especially in high throughput scenarios, but they also increase the risk of data loss during hard shutdowns where the queue is lost. Default: 500.

  • audit_exclude_principals: List of strings in JSON style identifying the principals the audit logging system should ignore. Principals can be listed as User:name or name, both are accepted. Default: null.

Even though audited event messages are stored to a specialized immutable topic, standard topic settings still apply. For example, you can apply the same Tiered Storage, retention time, and replication settings available to normal topics. These particular options are important for controlling the amount of disk space utilized by your audit topics.

You must configure certain audit logging properties before enabling audit logging because these settings impact the creation of the _redpanda.audit_log topic itself. These properties include: audit_log_num_partitions and audit_log_replication_factor. The Kafka API allows you to add partitions or alter the replication factor after enabling audit logging, but Redpanda prevents you from altering these two configuration values directly.

Audit logging event types

Redpanda’s auditable events fall into one of eight different event types. The APIs associated with each event type are as follows.

Audit event type Associated APIs

management

  • AlterPartitionReassignments

  • CreateACLs

  • CreatePartitions

  • CreateTopics

  • DeleteAcls

  • DeleteGroups

  • DeleteRecords

  • DeleteTopics

  • IncrementalAlterconfigs

  • OffsetDelete

produce

  • AddPartitionsToTxn

  • EndTxn

  • InitProducerId

  • Produce

consume

  • AddOffsetsToTxn

  • Fetch

  • JoinGroup

  • LeaveGroup

  • ListOffset

  • OffsetCommit

  • SyncGroup

  • TxOffsetCommit

describe

  • DescribeAcls

  • DescribeConfigs

  • DescribeGroups

  • DescribeLogDirs

  • FindCoordinator

  • ListGroups

  • ListPartitionReassignments

  • Metadata

  • OffsetForLeaderEpoch

  • DescribeProducers

  • DescribeTransations

  • ListTransactions

heartbeat

  • Heartbeat

authenticate

  • All authentication events

schema_registry

  • All Schema Registry API calls

admin

  • All Admin API calls

Enable audit logging

All audit log settings are applied at the cluster level. You can configure audit log settings in the Redpanda Helm chart, using Helm values or the Redpanda resource with the Redpanda Operator.

Use the rpk cluster config to configure audit logs. Some options will require a cluster restart. You can verify this using rpk cluster config status.

Some key tuning recommendations for your audit logging settings include:

  • If you wish to change the number of partitions or the replication factor for your audit log topic, set the audit_log_num_partitions and audit_log_replication_factor properties respectively.

  • Choose the type of events needed by setting audit_enabled_event_types to the desired list of event categories. Keep this as restrictive as possible based on your compliance and security needs to avoid excessive noise in your audit logs.

  • Identify non-sensitive topics so that you can exclude them from auditing. Specify this list of topics in audit_exclude_topics.

  • Identify non-sensitive principles so that you can exclude them from auditing. Specify this list of principals in audit_exclude_principles. This command accepts names in the form of name or User:name.

  • Set audit_enabled to true.

  • Optimize costs for audit logging.

The sequence of commands in rpk for this audit log configuration is:

rpk cluster config set audit_log_num_partitions 6
rpk cluster config set audit_log_replication_factor 5
rpk cluster config set audit_enabled_event_types '["management","describe","authenticate"]'
rpk cluster config set audit_exclude_topics '["topic1","topic2"]'
rpk cluster config set audit_exclude_principles '["User:principle1", "principle2"]'
rpk cluster config set audit_enabled true
rpk topic alter-config _redpanda.audit_log --set retention.ms=259200000

Optimize costs for audit logging

When enabled, audit logging can quickly generate a very large amount of data, especially if all event types are selected. Proper configuration of audit logging is critical to avoid filling your disk or using excess Tiered Storage. The configuration options available help ensure your audit logs contain only the volume of data necessary to meeting your regulatory or legal requirements.

With audit logging, the pattern of message generation may be very different from your typical sources of data. These messages reflect usage of your system as opposed to the operational data your topics typically process. As a result, your retention, replication, and Tiered Storage requirements may differ from your other topics.

A typical scenario with audit logging is to route the messages to an analytics platform like Splunk. If your retention period is too long, you will find that you are storing excessive amounts of replicated messages in both Redpanda and in your analytics suite. Identifying the right balance of retention and replication settings minimizes this duplication while retaining your data in a system that provides actionable intelligence.

Assess the retention needs for your audit logs. You may not need to keep the logs around for the default seven days. This is controlled by setting retention.ms for the _redpanda.audit_log topic or by setting delete_retention_ms at the cluster level.