Audit Logging in Kubernetes

Many scenarios for streaming data include the need for fine-grained auditing of user activity related to the system. This is especially true for regulated industries such as finance, healthcare, and the public sector. Complying with PCI DSS v4 standards, for example, requires verbose and detailed activity auditing, alerting, and analysis capabilities.

Redpanda’s auditing capabilities support recording both administrative and operational interactions with topics and with users. Redpanda complies with the Open Cybersecurity Schema Framework (OCSF), providing a predictable and extensible solution that works seamlessly with industry standard tools.

With audit logging enabled, there should be no noticeable changes in performance other than slightly elevated CPU usage.

Audit logging is configured at the cluster level. Redpanda supports excluding specific topics or principals from auditing to help reduce noise in the log. Audit logging is disabled by default.

Audit log flow

The Redpanda audit log mechanism functions similar to the Kafka flow you may be familiar with. When a user interacts with another user or with a topics, Redpanda writes an event to a specialized audit topic. The audit topic is immutable. Only Redpanda can write to it. Users are prevented from writing to the audit topic directly and the Kafka API cannot create or delete it.

Audit log flow

By default, any management and authentication actions performed on the cluster yield messages written to the audit log topic that are retained for seven days. Interactions with all topics by all principals are audited. Actions performed using the Kafka API and Admin API are all audited, as are actions performed directly through rpk.

Messages recorded to the audit log topic comply with the open cybersecurity schema framework. Any number of analytics frameworks, such as Splunk or Sumo Logic, can receive and process these messages. Using an open standard ensures Redpanda’s audit logs coexist with those produced by other IT assets, powering holistic monitoring and analysis of your assets.

Audit log configuration options

Redpanda’s audit logging mechanism supports several options to control the volume and availability of audit records. Configuration is applied at the cluster level using the standard cluster configuration mechanism.

You can configure these options directly in either the Helm values or the Redpanda resource.

  • auditLogging.enabled: Sets the value of the audit_enabled cluster property to enable audit logging. When you set this to true, Redpanda checks for an existing topic named _redpanda.audit_log. If none is found, Redpanda automatically creates one for you. Default: false.

  • auditLogging.partitions: Sets the value of the audit_log_num_partitions cluster property to define the number of partitions used by a newly created audit topic. This configuration applies only to the audit log topic and may be different from the cluster or other topic configurations. This cannot be altered for an existing audit log topic. Default: 12.

  • auditLogging.replicationFactor: Sets the value of the audit_log_replication_factor cluster property to define the replication factor for a newly created audit log topic. This configuration applies only to the audit log topic and may be different from the cluster or other topic configurations. This cannot be altered for existing audit log topics. If a value is not provided, Redpanda will use the internal_topic_replication_factor cluster config value. Default: null.

  • auditLogging.enabledEventTypes: Sets the value of the audit_enabled_event_types cluster property. This option is a list of JSON strings identifying the event types to include in the audit log. Valid values include any of the following - management, produce, consume, describe, heartbeat, authenticate, schema_registry, admin. Default: '["management","authenticate","admin"]'.

  • auditLogging.excludedTopics: Sets the value of the audit_exclude_topics cluster property. This option is a list of JSON strings identifying the topics the audit logging system should ignore. This list cannot include the _redpanda.audit_log topic. Redpanda will reject the command if you do attempt to include that topic. Default: null.

  • auditLogging.excludedPrincipals: Sets the value of the audit_exclude_principals cluster property. This option is a list of JSON strings identifying the principals the audit logging system should ignore. Principals can be listed as User:name or name, both are accepted. Default: null.

  • auditLogging.clientMaxBufferSize: Sets the value of the audit_client_max_buffer_size cluster property to define the number of bytes allocated by the internal audit client for audit messages. When changing this, you must disable audit logging and then re-enable it for the change to take effect. Consider increasing this if your system generates a very large number of audit records in a short amount of time. Default: 16777216.

  • auditLogging.queueDrainIntervalMs: Sets the value of the audit_queue_drain_interval_ms cluster property. Internally, Redpanda batches audit log messages in memory and periodically writes them to the audit log topic. This option defines the period in milliseconds between draining this queue to the audit log topic. Longer intervals may help prevent duplicate messages, especially in high throughput scenarios, but they also increase the risk of data loss during hard shutdowns where the queue is lost. Default: 500.

  • auditLogging.queueMaxBufferSizePerShard: Sets the value of the audit_queue_max_buffer_size_per_shard cluster property to define the maximum amount of memory in bytes used by the audit buffer in each shard. Once this size is reached, requests to log additional audit messages will return a non-retryable error. Default: 1048576.

Even though audited event messages are stored to a specialized immutable topic, standard topic settings still apply. For example, you can apply the same Tiered Storage, retention time, and replication settings available to normal topics. These particular options are important for controlling the amount of disk space utilized by your audit topics.

You cannot change the values of auditLogging.partitions and auditLogging.replicationFactor after enabling audit logging because these settings impact the creation of the _redpanda.audit_log topic. The Kafka API allows you to add partitions or alter the replication factor after enabling audit logging, but Redpanda prevents you from altering these two configuration values directly.

Audit logging event types

Redpanda’s auditable events fall into one of eight different event types. The APIs associated with each event type are as follows.

Audit event type Associated APIs

management

  • AlterPartitionReassignments

  • CreateACLs

  • CreatePartitions

  • CreateTopics

  • DeleteAcls

  • DeleteGroups

  • DeleteRecords

  • DeleteTopics

  • IncrementalAlterconfigs

  • OffsetDelete

produce

  • AddPartitionsToTxn

  • EndTxn

  • InitProducerId

  • Produce

consume

  • AddOffsetsToTxn

  • Fetch

  • JoinGroup

  • LeaveGroup

  • ListOffset

  • OffsetCommit

  • SyncGroup

  • TxOffsetCommit

describe

  • DescribeAcls

  • DescribeConfigs

  • DescribeGroups

  • DescribeLogDirs

  • FindCoordinator

  • ListGroups

  • ListPartitionReassignments

  • Metadata

  • OffsetForLeaderEpoch

  • DescribeProducers

  • DescribeTransations

  • ListTransactions

heartbeat

  • Heartbeat

authenticate

  • All authentication events

schema_registry

  • All Schema Registry API calls

admin

  • All Admin API calls

Enable audit logging

All audit log settings are applied at the cluster level. You can configure audit log settings in the Redpanda Helm chart, using Helm values or the Redpanda resource with the Redpanda Operator.

  • Helm + Operator

  • Helm

If you want to manage the audit topic using a Topic resource:

  1. Create the Redpanda resource with SASL enabled:

    redpanda-cluster.yaml
    apiVersion: cluster.redpanda.com/v1alpha1
    kind: Redpanda
    metadata:
      name: redpanda
    spec:
      chartRef: {}
      clusterSpec:
        auth:
          sasl:
            enabled: true
            secretRef: "redpanda-users"
  2. Create a Secret to hold the password of either a user or a superuser:

    kubectl --namespace <namespace> create secret generic audit-topic-user-password --from-literal=password=<password>
    If you aren’t using a superuser, make sure the user is authorized to manage the _redpanda.audit_log topic.
  3. Create the Topic resource, and make sure to set overwriteTopicName to _redpanda.audit_log:

    apiVersion: cluster.redpanda.com/v1alpha1
    kind: Topic
    metadata:
      name: audit-topic
    spec:
      partitions: <partitions>
      overwriteTopicName: _redpanda.audit_log
      kafkaApiSpec:
        brokers:
          - "redpanda-0.redpanda.<namespace>.svc.cluster.local:9093"
          - "redpanda-1.redpanda.<namespace>.svc.cluster.local:9093"
          - "redpanda-2.redpanda.<namespace>.svc.cluster.local:9093"
        tls:
          caCertSecretRef:
            name: "redpanda-default-cert"
            key: "ca.crt"
        sasl:
          username: <username>
          mechanism: <mechanism>
          passwordSecretRef:
            name: audit-topic-user-password
            key: password

    This example configuration contains values for the kafkaApiSpec.tls and kafkaApiSpec.brokers settings that work with the default Helm values. If you modified these settings in your deployment, make sure to replace those values.

  4. Update the Redpanda resource to enable audit logging:

    redpanda-cluster.yaml
    apiVersion: cluster.redpanda.com/v1alpha1
    kind: Redpanda
    metadata:
      name: redpanda
    spec:
      chartRef: {}
      clusterSpec:
        auth:
          sasl:
            enabled: true
            secretRef: "redpanda-users"
        auditLogging:
          enabled: true

If you don’t want to use the Topic resource, you can enable audit logging and Redpanda will create the audit topic for you:

redpanda-cluster.yaml
apiVersion: cluster.redpanda.com/v1alpha1
kind: Redpanda
metadata:
  name: redpanda
spec:
  chartRef: {}
  clusterSpec:
    auth:
      sasl:
        enabled: true
        secretRef: "redpanda-users"
    auditLogging:
      enabled: true
kubectl apply -f redpanda-cluster.yaml --namespace <namespace>
  • --values

  • --set

audit-logging.yaml
auth:
  sasl:
    enabled: true
    secretRef: "redpanda-users"
auditLogging:
  enabled: true
helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
--values audit-logging.yaml --reuse-values
helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace \
  --set auth.sasl.enabled=true \
  --set auth.sasl.secretRef="redpanda-users" \
  --set auditLogging.enabled=true
  • auth.sasl.enabled: To enable audit logging, you must have:

    • SASL authentication enabled.

    • At least one Kafka listener that uses the SASL authentication mechanism. By default, the internal Kafka listener is used (listeners.kafka). To use a different listener, see Audit log configuration options.

    • At least one superuser.

  • auditLogging.enabled: Enable audit logging. When you set this to true, Redpanda checks for an existing topic named _redpanda.audit_log. If the topic is not found, Redpanda automatically creates one for you. Default: false.

Optimize costs for audit logging

When enabled, audit logging can quickly generate a very large amount of data, especially if all event types are selected. Proper configuration of audit logging is critical to avoid filling your disk or using excess Tiered Storage. The configuration options available help ensure your audit logs contain only the volume of data necessary to meeting your regulatory or legal requirements.

With audit logging, the pattern of message generation may be very different from your typical sources of data. These messages reflect usage of your system as opposed to the operational data your topics typically process. As a result, your retention, replication, and Tiered Storage requirements may differ from your other topics.

A typical scenario with audit logging is to route the messages to an analytics platform like Splunk. If your retention period is too long, you will find that you are storing excessive amounts of replicated messages in both Redpanda and in your analytics suite. Identifying the right balance of retention and replication settings minimizes this duplication while retaining your data in a system that provides actionable intelligence.

Assess the retention needs for your audit logs. You may not need to keep the logs around for the default seven days. This is controlled by setting retention.ms for the _redpanda.audit_log topic or by setting delete_retention_ms at the cluster level.