Configure Client Connections

Optimize the availability of your clusters by configuring and tuning properties.

Limit client connections

A malicious Kafka client application may create many network connections to execute its attacks. A poorly configured application may also create an excessive number of connections. To mitigate the risk of a client creating too many connections and using too many system resources, you can configure a Redpanda cluster to impose limits on the number of created client connections.

The following Redpanda cluster properties limit the number of connections:

  • These connection limit properties are disabled by default. You must manually enable them.

  • The total number of connections is not equal to the number of clients, because a client can open multiple connections. As a conservative estimate, for a cluster with N brokers, plan for N + 2 connections per client.

Configure connection count limit by client IP

Use the kafka_connections_max_per_ip property to limit the number of connections from each client IP address.

Per-IP connection controls require Redpanda to see individual client IPs. If clients connect through private link endpoints, NAT gateways, or other shared-IP egress, the per-IP limit applies to the shared IP, affecting all clients behind it and preventing isolation of a single offending client. Similarly, multiple clients running on the same host will share the same IP address, and the limit applies collectively to all those clients.

Configure the limit

To configure kafka_connections_max_per_ip safely without disrupting legitimate clients, follow these steps:

  1. Set up your monitoring stack for your cluster. See Monitor Redpanda Cloud.

  2. Monitor current connection patterns using the redpanda_rpc_active_connections metric with the redpanda_server="kafka" filter:

    redpanda_rpc_active_connections{redpanda_id="CLOUD_CLUSTER_ID", redpanda_server="kafka"}
  3. Analyze the connection data to identify the normal range of connections for each broker during typical traffic cycles. For example, in the following Grafana screenshot, the normal range is around 200-300 connections:

    Range of active connections over time
  4. Set the kafka_connections_max_per_ip value based on your analysis. Use the upper bound of normal connections from step 3, or use a lower value if you know how many connections per client IP are being opened.

  5. Continue monitoring the connection metrics after applying the limit to ensure that legitimate clients are not affected and that the problematic client is properly controlled.

Limitations

  • Decreasing the limit does not terminate any currently open Kafka API connections.

  • This limit does not apply to Kafka HTTP Proxy connections.

  • Clients behind NAT gateways or private links share the same IP address as seen by Redpanda brokers.

  • The limit may negatively affect tail latencies across all client connections.

  • All clients behind the shared IP are collectively subject to the single kafka_connections_max_per_ip limit.

  • Connection rejections occur randomly among clients when the limit is reached. For example, suppose kafka_connections_max_per_ip is set to 100, but clients behind a NAT gateway collectively need 150 connections. When the limit is reached, clients can make only some of the connections while others get rejected, leaving the client in a not-working state.

  • Redpanda may modify this property during internal operations.

  • Availability incidents caused by misconfiguring this feature are excluded from the Redpanda Cloud SLA.

Configure client reconnections

You can configure the Kafka client backoff and retry properties to change the default behavior of the clients to suit your failure requirements.

Set the following Kafka client properties on your application’s producer or consumer to manage client reconnections:

  • reconnect.backoff.ms: Amount of time to wait before attempting to reconnect to the broker. The default is 50 milliseconds.

  • reconnect.backoff.max.ms: Maximum amount of time in milliseconds to wait when reconnecting to a broker. The backoff increases exponentially for each consecutive connection failure, up to this maximum. The default is 1000 milliseconds (1 second).

Additionally, you can use Kafka properties to control message retry behavior. Delivery fails when either the delivery timeout or the number of retries is met.

  • delivery.timeout.ms: Amount of time for message delivery, so messages are not retried forever. The default is 120000 milliseconds (2 minutes).

  • retries: Number of times a producer can retry sending a message before marking it as failed. The default value is 2147483647 for Kafka >= 2.1, or 0 for Kafka <= 2.0.

  • retry.backoff.ms: Amount of time to wait before attempting to retry a failed request to a given topic partition. The default is 100 milliseconds.