Deploy for Production: Automated

If you use automation tools like Terraform and Ansible in your environment, you can use them to quickly provision a Redpanda cluster. Terraform can set up the infrastructure and output a properly-formatted hosts.ini file, and Ansible can use that hosts.ini file as input to install Redpanda.

If you already have an infrastructure provisioning framework, you can supply your own hosts file (without using Terraform), and you can use Ansible to install Redpanda.

This recommended automated deployment provides a production-usable way to deploy and maintain a cluster. For unique configurations, you can work directly with the Ansible and Terraform modules to integrate them into your environment.

Prerequisites

  1. Install Terraform following the Terraform documentation.

  2. Install Ansible following the Ansible documentation. Different operating systems may have specific Ansible dependencies.

  3. Clone the deployment-automation GitHub repository:

    git clone https://github.com/redpanda-data/deployment-automation.git
  4. Change into the directory:

    cd deployment-automation

Use Terraform to set up infrastructure

  • AWS

  • GCP

The recommended Terraform module for Redpanda deploys virtual machines on AWS EC2. To create an AWS Redpanda cluster, review the default variables and make any edits necessary for your environment.

  1. In the deployment-automation folder, change into the aws directory:

    cd aws
  2. Set AWS credentials. Terraform provides multiple ways to set the AWS secret and key. See the Terraform documentation.

  3. Initialize Terraform:

    terraform init
  4. Create the cluster with terraform apply:

    terraform apply -var='public_key_path=~/.ssh/id_rsa.pub' -var='subnet_id=<subnet-id>' -var='vpc_id=<vpc-id>'
    • Terraform configures public_key_path on the brokers to remotely connect with SSH. If the public key path isn’t the default ~/.ssh/id_rsa.pub, then you need to set it.

    • If you don’t have a default VPC defined, then you need to set subnet_id and vpc_id.

For a complete and up-to-date list of configuration options, refer to the (Terraform module):

For acceptable distro names:

data "aws_ami" "ami" {
  most_recent = true

  filter {
    name   = "name"
    values = [
      "ubuntu/images/hvm-ssd/ubuntu-*-amd64-server-*",
      "ubuntu/images/hvm-ssd/ubuntu-*-arm64-server-*",
      "Fedora-Cloud-Base-*.x86_64-hvm-us-west-2-gp2-0",
      "debian-*-amd64-*",
      "debian-*-hvm-x86_64-gp2-*'",
      "amzn2-ami-hvm-2.0.*-x86_64-gp2",
      "RHEL*HVM-*-x86_64*Hourly2-GP2"
    ]
  }

  filter {
    name   = "architecture"
    values = [var.machine_architecture]
  }

  filter {
    name   = "name"
    values = ["*${var.distro}*"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }

  owners = ["099720109477", "125523088429", "136693071363", "137112412989", "309956199498"]
  # Canonical, Fedora, Debian (new), Amazon, RedHat
}
  1. In the deployment-automation folder, change into the gcp directory:

    cd gcp
  2. Set GCP credentials. Terraform provides multiple ways to set the GCP secret and key. See the Terraform documentation.

  3. This module generates a properly-configured GCP network within your GCP project by default. You can disable this functionality by commenting out that section of the main.tf and providing your own subnet to var.subnet.

  4. You will need to either create a GCP project or provide a Project ID for an existing project.

  5. Initialize Terraform:

    terraform init
  6. Create the cluster:

    terraform apply

The following example shows how to creates a three-broker cluster:

terraform apply --var="public_key_path=~/.ssh/id_rsa.pub" --var "ssh_user=ubuntu" --var="project_name=$GCP_PROJECT_ID"

For a full, up-to-date list of configuration options, see the (Terraform module):

Use Ansible to install Redpanda

  1. From the deployment-automation folder, set the required Ansible variables:

    export CLOUD_PROVIDER=<your-cloud-provider>
    export DEPLOYMENT_PREFIX=<your-prefix>
    export ANSIBLE_COLLECTIONS_PATHS=${PWD}/artifacts/collections
    export ANSIBLE_ROLES_PATH=${PWD}/artifacts/roles
    export ANSIBLE_INVENTORY=${PWD}/artifacts/hosts_gcp_$DEPLOYMENT_PREFIX.ini
  2. Install the roles required by Ansible:

    ansible-galaxy install -r requirements.yml

Configure a hosts file

Redpanda Data recommends incorporating variables into your $ANSIBLE_INVENTORY file for every host. Edits made to properties outside of the playbook may be overwritten.

If you used Terraform to deploy the instances, the hosts.ini is configured automatically in the artifacts directory.

If you didn’t use Terraform, then you must manually update the [redpanda] section. When you open the file, you see something like the following:

[redpanda]
ip ansible_user=ssh_user ansible_become=True private_ip=pip id=0
ip ansible_user=ssh_user ansible_become=True private_ip=pip id=1

[monitor]
ip ansible_user=ssh_user ansible_become=True private_ip=pip id=1

Under the [redpanda] section, replace the following:

Property Description

ip

The public IP address of the machine.

ansible_user

The username for Ansible to use to SSH to the machine.

private_ip

The private IP address of the machine. This could be the same as the public IP address.

You can add additional properties to configure features like rack awareness and Tiered Storage.

The [monitor] section is only required if you want the playbook to install and configure a basic Prometheus and Grafana setup for observability. If you have a centralized monitoring setup or if you don’t require monitoring, then remove this section.

Run a playbook

Use the Ansible Collection for Redpanda to build a Redpanda cluster. The recommended Redpanda playbook enables TLS encryption and Tiered Storage.

If you prefer, you can download the modules and required roles and create your own playbook. For example, if you want to handle your own data directory, you can toggle that part off, and Redpanda ensures that the permissions are correct. If you want to generate your own security certificates, you can.

To install and start a Redpanda cluster in one command with the Redpanda playbook, run:

ansible-playbook --private-key <your-private-key> -v ansible/playbooks/provision-cluster.yml
  • The private key corresponds to the public key in the distro_user SSH configuration.

  • To use your own playbook, replace provision-cluster.yml with your playbook name.

  • When you use a playbook to create a cluster, you should also use the playbook for subsequent operations, like upgrades. The Ansible modules safely handle rolling upgrades, but you must comply with Redpanda version path requirements.

Custom configuration

You can specify any available Redpanda configuration value, or set of values, by passing a JSON dictionary as an Ansible extra-var. These values are spliced with the calculated configuration and only override the values that you specify. Values must be unset manually with rpk. There are two sub-dictionaries you can specify: redpanda.cluster and redpanda.node. For more information, see Cluster Configuration Properties and Broker Configuration Properties.

export JSONDATA='{"cluster":{"auto_create_topics_enabled":"true"},"node":{"developer_mode":"false"}}'
ansible-playbook ansible/<playbook-name>.yml --private-key artifacts/testkey -e redpanda="${JSONDATA}"
Adding whitespace to the JSON breaks configuration merging.

Use rpk and standard Kafka tools to produce and consume from the Redpanda cluster.

Configure Prometheus and Grafana

Include a [monitor] section in your hosts file if you want the playbook to install and configure a basic Prometheus and Grafana setup for observability. Redpanda emits Prometheus metrics that can be scrapped with a central collector. If you already have a centralized monitoring setup or if you don’t require monitoring, then this is unnecessary.

To run the deploy-monitor.yml playbook:

ansible-playbook ansible/deploy-monitor.yml \
--private-key '<path-to-a-private-key-with-ssh-access-to-the-hosts>'

Configure Redpanda Console

To install Redpanda Console, add the redpanda_broker role to a group with install_console: true. The standard playbooks automatically install Redpanda Console on hosts in the [client] group.

Build the cluster with TLS enabled

Configure TLS with externally-provided and signed certificates. Then run the provision-cluster-tls.yml playbook, specifying the certificate locations on new hosts. You can either pass the variables in the command line or edit the file and pass them there. Consider whether you want public access to the Kafka API and Admin API endpoints. For example:

ansible-playbook ansible/provision-cluster-tls.yml \
--private-key '<path-to-a-private-key-with-ssh-access-to-the-hosts>' \
--extra-vars create_demo_certs=false \
--extra-vars advertise_public_ips=false \
--extra-vars handle_certs=false \
--extra-vars redpanda_truststore_file='<path-to-ca.crt-file>'

It is important to use a signed certificate from a valid CA for production environments. The playbook uses locally-signed certificates that are not recommended for production use. Provide a valid certificate using these variables:

redpanda_certs_dir: /etc/redpanda/certs
redpanda_csr_file: "{{ redpanda_certs_dir }}/node.csr"
redpanda_key_file: "{{ redpanda_certs_dir }}/node.key"
redpanda_cert_file: "{{ redpanda_certs_dir }}/node.crt"
redpanda_truststore_file: "{{ redpanda_certs_dir }}/truststore.pem"

For testing, you could deploy a local CA to generate private keys and signed certificates:

ansible-playbook ansible/provision-cluster-tiered-storage.yml \
--private-key '<path-to-a-private-key-with-ssh-access-to-the-hosts>'

Add brokers to an existing cluster

To add brokers to a cluster, you must add them to the hosts file and run the relevant playbook again. You can add skip_node=true to the existing hosts to avoid the playbooks being rerun on them.

Upgrade a cluster

The playbook is designed to be idempotent, so it should be suitable for running as part of a CI/CD pipeline or through Ansible Tower. The playbook upgrades the packages and then performs a rolling upgrade, where one broker at a time is upgraded and safely restarted. For all upgrade requirements and recommendations, see Upgrade Redpanda. It is important to test that your upgrade path is safe before using it in production.

To upgrade a cluster, run the playbook with a specific target version:

ansible-playbook --private-key ~/.ssh/id_rsa ansible/<playbook-name>.yml -e redpanda_version=22.3.10-1

By default, the playbook selects the latest version of the Redpanda packages, but an upgrade is only performed if the redpanda_install_status variable is set to latest:

ansible-playbook --private-key ~/.ssh/id_rsa ansible/<playbook-name>.yml -e redpanda_install_status=latest

To upgrade clusters with SASL authentication:

export JSONDATA='{"cluster":{"auto_create_topics_enabled":"true"},"node":{"developer_mode":"false"}}'
ansible-playbook ansible/<playbook-name>.yml --private-key artifacts/testkey -e redpanda="${JSONDATA}"

Similarly, you can put the redpanda_rpk_opts into a YAML file protected with Ansible vault.

ansible-playbook --private-key ~/.ssh/id_rsa ansible/<playbook-name>.yml --extra-vars=redpanda_install_status=latest --extra-vars @vault-file.yml --ask-vault-pass

Redpanda Ansible Collection values

You can pass the following variables as -e var=value when running Ansible:

Property Default value Description

redpanda_organization

redpanda-test

Set this to identify your organization in the asset management system.

redpanda_cluster_id

redpanda

This helps identify the cluster.

advertise_public_ips

false

Configure Redpanda to advertise the broker’s public IPs for client communication instead of private IPs. This enables using the cluster from outside its subnet.

Note: This is not recommended for production deployments, because your brokers will be public.

grafana_admin_pass

<your-secure-password>

Grafana admin user’s password.

ephemeral_disk

false

Enable file system check for attached disk.

This is useful when using attached disks in instances with ephemeral operating system disks like Azure L Series. This allows a file system repair at boot time and ensures that the drive is remounted automatically after a reboot.

redpanda_mode

production

Enables hardware optimization.

redpanda_admin_api_port

9644

redpanda_kafka_port

9092

redpanda_rpc_port

33145

redpanda_schema_registry_port

8081

is_using_unstable

false

Enables access to unstable builds.

redpanda_version

latest

Version; for example, 22.2.2-1 or 22.3.1~rc1-1. If this value is set, then the package is upgraded if the installed version is lower than what has been specified.

redpanda_rpk_opts

Command line options to be passed to instances where rpk is used on the playbook. For example, superuser credentials can be specified as --user myuser --password mypassword.

redpanda_install_status

present

If redpanda_version is set to latest, then changing redpanda_install_status to latest causes an upgrade; otherwise, the currently-installed version remains.

redpanda_data_directory

/var/lib/redpanda/data

Path where Redpanda keeps its data.

redpanda_key_file

/etc/redpanda/certs/node.key

TLS: Path to private key.

redpanda_cert_file

/etc/redpanda/certs/node.crt

TLS: Path to signed certificate.

redpanda_truststore_file

/etc/redpanda/certs/truststore.pem

TLS: Path to truststore.

tls

false

Set to true to configure Redpanda to use TLS. This can be set on each broker, although this may lead to errors configuring rpk.

skip_node

false

Broker configuration to prevent the redpanda_broker role being applied to this specific broker. Use carefully when adding new brokers to avoid existing brokers from being reconfigured.

restart_node

false

Broker configuration to prevent Redpanda brokers from being restarted after updating. Use with care: This can cause rpk to be reconfigured but the broker is not restarted and therefore is in an inconsistent state.

rack

undefined

Broker configuration to enable rack awareness. Rack awareness is enabled cluster-wide if at least one broker has this set.

tiered_storage_bucket_name

Set bucket name to enable Tiered Storage.

schema_registry_replication_factor

1

The replication factor of Schema Registry’s internal storage topic.

aws_region

The region to be used if Tiered Storage is enabled.

Troubleshooting

On Mac OS X, Python may be unable to fork workers. You may see something like the following:

ok: [34.209.26.177] => {“changed”: false, “stat”: {“exists”: false}}
objc[57889]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called.
objc[57889]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
ERROR! A worker was found in a dead state

Try setting an environment variable to resolve the error:

export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

Next steps