Resolve Errors in Linux

This section describes errors or issues you might encounter while deploying Redpanda in Linux and explains how to troubleshoot them.

Deployment issues

This section addresses common deployment issues encountered during Redpanda setup or upgrades.

A Redpanda Enterprise Edition license is required

During a Redpanda upgrade, if enterprise features are enabled and a valid Enterprise Edition license is missing, Redpanda logs a warning and aborts the upgrade process on the first broker. This issue prevents a successful upgrade.

A Redpanda Enterprise Edition license is required to use the currently enabled features. To apply your license, downgrade this broker to the pre-upgrade version and provide a valid license key via rpk using 'rpk cluster license set <key>', or via Redpanda Console. To request an enterprise license, please visit <redpanda.com/upgrade>. To try Redpanda Enterprise for 30 days, visit <redpanda.com/try-enterprise>. For more information, see <https://docs.redpanda.com/current/get-started/licenses>.

If you encounter this message, follow these steps to recover:

  1. Roll back the affected broker to the original version.

  2. Do one of the following:

    • Apply a valid Redpanda Enterprise Edition license to the cluster.

    • Disable enterprise features.

      If you do not have a valid license and want to proceed without using enterprise features, you can disable the enterprise features in your Redpanda configuration.

  3. Retry the upgrade.

TLS issues

This section covers common TLS errors, their causes, and solutions, including certificate issues and correct client configuration.

Invalid large response size

This error appears when your cluster is configured to use TLS, but you don’t specify that you are connecting over TLS.

unable to request metadata: invalid large response size 352518912 > limit 104857600; the first three bytes received appear to be a tls alert record for TLS v1.2; is this a plaintext connection speaking to a tls endpoint?

If you’re using rpk, ensure to add the -X tls.enabled flag, and any other necessary TLS flags such as the TLS certificate:

rpk cluster info -X tls.enabled=true

For all available flags, see the rpk options reference.

Malformed HTTP response

This error appears when a cluster has TLS enabled, and you try to access the admin API without passing the required TLS parameters.

Retrying POST for error: Post "http://127.0.0.1:9644/v1/security/users": net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x15\x03\x03\x00\x02\x02"

If you’re using rpk, ensure to include the TLS flags.

For all available flags, see the rpk options reference.

x509: certificate signed by unknown authority

This error appears when the Certificate Authority (CA) that signed your certificates is not trusted by your system.

Check the following:

  • Ensure you have installed the root CA certificate correctly on your local system.

  • If using a self-signed certificate, ensure it is properly configured and included in your system’s trust store.

  • If you are using a certificate issued by a CA, ensure the issuing CA is included in your system’s trust store.

  • Check the validity of your certificates. They might have expired.

x509: certificate is not valid for any names

This error indicates that the certificate you are using is not valid for the specific domain or IP address you are trying to use it with. This error typically occurs when there is a mismatch between the certificate’s Subject Alternative Name (SAN) or Common Name (CN) field and the name being used to access the broker.

To fix this error, you may need to obtain a new certificate that is valid for the specific domain or IP address you are using. Ensure that the certificate’s SAN or CN entry matches the name being used, and that the certificate is not expired or revoked.

cannot validate certificate for 127.0.0.1

This error appears if you are using a CA certificate when you try to establish an internal connection using localhost. For example:

unable to request metadata: unable to dial: x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs

To fix this error, you must either specify the URL with a public domain or use self-signed certificates:

rpk cluster info \
-X brokers=<redpanda-url>:<port> \
-X tls.enabled=true

SASL issues

This section addresses errors related to SASL (Simple Authentication and Security Layer), focusing on connection and authentication problems.

Is SASL missing?

This error appears when you try to interact with a cluster that has SASL enabled without passing a user’s credentials.

unable to request metadata: broker closed the connection immediately after a request was issued, which happens when SASL is required but not provided: is SASL missing?

If you’re using rpk, ensure to specify the -X user, -X pass, and -X sasl.mechanism flags.

For all available flags, see the rpk options reference.

pattern_type is unspecified

When creating a shadow link with rpk shadow create, you may see:

Invalid cluster link configuration: pattern_type is unspecified

Ensure pattern_type values are uppercase: LITERAL, PREFIX.

broker_not_available with TLS enabled

When creating a shadow link with TLS enabled, you may see:

Cluster link unreachable, preflight check failed - { node: -1 }, { error_code: broker_not_available [8] }

The shadow cluster cannot verify the source cluster’s TLS certificate. This is the most common issue when using TLS with self-signed certificates (the default for Kubernetes deployments with tls.certs.default.caEnabled=true).

Ensure that the shadow link configuration includes the source cluster’s CA certificate.

Wrong SSL version number

When creating a shadow link, you may see in the source cluster logs:

Disconnected (applying protocol, Wrong SSL Version number: ensure client is configured to use TLS)

The source cluster requires TLS but your shadow link configuration is missing TLS settings or has tls_settings.enabled: false.

broker_not_available without TLS

When creating a shadow link without TLS, you may see:

Cluster link unreachable, preflight check failed - { node: -1 }, { error_code: broker_not_available [8] }

Verify that bootstrap_servers addresses are reachable from the shadow cluster and that ports are correct.

Connection timeout

When creating a shadow link, the command may hang or timeout without completing.

Check network connectivity between shadow and source clusters. Verify firewall rules and network policies allow traffic between the namespaces.

Topics in FAULTED state

When monitoring shadow links, you may see topics showing FAULTED state in status output.

Common causes include:

  • Source topic deleted: topic no longer exists on source cluster

  • Permission denied: shadow link service account lacks required permissions

  • Network interruption: temporary connectivity issues

If the source topic still exists and should be replicated, delete and recreate the shadow link to reset the faulted state.

High replication lag

When monitoring shadow links, you may see LAG values continuously increasing in rpk shadow status.

Check the following:

  • Check source cluster load: high produce rate may exceed replication capacity

  • Check shadow cluster resources: CPU, memory, or disk constraints

  • Check network bandwidth: verify sufficient bandwidth between clusters

To resolve:

  • Scale shadow cluster resources if constrained

  • Verify network connectivity and bandwidth

  • Review topic configuration for optimization opportunities

When monitoring shadow links, you may see tasks showing LINK_UNAVAILABLE state with "No brokers available" message.

Common causes include:

  • Source cluster requires SASL authentication but shadow link not configured for it

  • Source cluster unreachable from shadow cluster

  • Network policy blocking traffic between clusters

To resolve:

  • Verify SASL configuration if source cluster requires authentication

  • Test network connectivity: kubectl exec into shadow pod and try connecting to source cluster

  • Check Kubernetes NetworkPolicies and firewall rules

Application connection failures after failover

Applications may not be able to connect to the shadow cluster after failover.

Confirm authentication credentials are valid for the shadow cluster and test network connectivity from application hosts.

Consumer group offset issues after failover

After failover, consumers may start from the beginning or wrong positions.

If necessary, manually reset offsets to appropriate positions. See How to manage consumer group offsets in Redpanda for detailed reset procedures.