Automated Deployment

If you use automation tools like Terraform and Ansible in your environment, you can use them to quickly provision a Redpanda cluster. Terraform can set up the infrastructure and output a properly-formatted hosts.ini file, and Ansible can use that hosts.ini file as input to install Redpanda.

If you already have an infrastructure provisioning framework, you can supply your own hosts file (without using Terraform), and you can use Ansible to install Redpanda.

This recommended automated deployment provides a production-usable way to deploy and maintain a cluster. For unique configurations, you can work directly with the Ansible and Terraform modules to integrate them into your environment.

Prerequisites

  1. Install Terraform following the Terraform documentation.

  2. Install Ansible following the Ansible documentation. Different operating systems may have specific Ansible dependencies.

  3. Clone the deployment-automation GitHub repository:

    git clone https://github.com/redpanda-data/deployment-automation.git
  4. Change into the directory:

    cd deployment-automation

Use Terraform to set up infrastructure

  • AWS

  • GCP

The recommended Terraform module for Redpanda deploys virtual machines on AWS EC2. To create an AWS Redpanda cluster, review the default variables and make any edits necessary for your environment.

  1. In the deployment-automation folder, change into the aws directory:

    cd aws
  2. Set AWS credentials. Terraform provides multiple ways to set the AWS secret and key. See the Terraform documentation.

  3. Initialize Terraform:

    terraform init
  4. Create the cluster with terraform apply:

    terraform apply -var='public_key_path=~/.ssh/id_rsa.pub' -var='subnet_id=<subnet-id>' -var='vpc_id=<vpc-id>'
    • Terraform configures public_key_path on the brokers to remotely connect with SSH. If the public key path isn’t the default ~/.ssh/id_rsa.pub, then you need to set it.

    • If you don’t have a default VPC defined, then you need to set subnet_id and vpc_id.

Configuration options (for the full list, see the Terraform module):

Property Description

aws_region

The AWS region to use for deploying the infrastructure. Default: us-west-2

nodes

The number of nodes to base the cluster on. Default: 3

enable_monitoring

Creates a Prometheus/Grafana instance for monitoring the cluster. Default: true

instance_type

The instance type on which Redpanda is deployed. Default: i3.8xlarge

prometheus_instance_type

The instance type on which Prometheus and Grafana are deployed. Default: c5.2xlarge

public_key_path

Path to the public key of the keypair used to access the nodes. Default: ~/.ssh/id_rsa.pub

distro

Linux distribution to install. (This affects the distro_ variables.) Default: ubuntu-focal

distro_ami

AWS AMI to use for each available distribution. These must be changed according to the chosen AWS region.

distro_ssh_user

User used to ssh into the created EC2 instances.

For acceptable distro names:

data "aws_ami" "ami" {
  most_recent = true

  filter {
    name   = "name"
    values = [
      "ubuntu/images/hvm-ssd/ubuntu-*-amd64-server-*",
      "ubuntu/images/hvm-ssd/ubuntu-*-arm64-server-*",
      "Fedora-Cloud-Base-*.x86_64-hvm-us-west-2-gp2-0",
      "debian-*-amd64-*",
      "debian-*-hvm-x86_64-gp2-*'",
      "amzn2-ami-hvm-2.0.*-x86_64-gp2",
      "RHEL*HVM-*-x86_64*Hourly2-GP2"
    ]
  }

  filter {
    name   = "architecture"
    values = [var.machine_architecture]
  }

  filter {
    name   = "name"
    values = ["*${var.distro}*"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }

  owners = ["099720109477", "125523088429", "136693071363", "137112412989", "309956199498"]
  # Canonical, Fedora, Debian (new), Amazon, RedHat
}
  1. In the deployment-automation folder, change into the gcp directory:

    cd gcp
  2. You need an existing subnet in which to deploy the virtual machines (VMs). The subnet’s attached firewall should allow inbound traffic on ports 22, 3000, 8082, 8888, 8889, 9090, 9092, 9644, and 33145. This module adds the rp-node tag to the deployed VMs, which can be used as the target tag for the firewall rule.

  3. Initialize Terraform:

    terraform init
  4. Create the cluster:

    terraform apply

The following example creates a three-broker cluster using the subnet named redpanda-cluster-subnet:

terraform apply -var nodes=3 -var subnet=redpanda-cluster-subnet -var public_key_path=~/.ssh/id_rsa.pub -var ssh_user=$USER

Configuration options (for the full list, see the Terraform module):

Property Description

region

The region to use for deploying the infrastructure. Default: us-west1

zone

The region’s zone to deploy the infrastructure. Default: a

subnet

The name of an existing subnet to deploy the infrastructure.

nodes

The number of nodes to base the cluster on. Keep in mind that one node is used as a monitoring node. Default: 1

disks

The number of local disks to deploy on each machine. Default = 1

image

The OS image running on the VMs. Default: ubuntu-os-cloud/ubuntu-1804-lts

machine_type

The machine type. Default: n2-standard-2

public_key_path

Path to the public key of the keypair used to access the nodes.

ssh_user

The ssh user. Must match the one in the public ssh key’s comments.

Use Ansible to install Redpanda

  1. From the deployment-automation folder, set the required Ansible variables:

    export CLOUD_PROVIDER=<aws-or-gcp>
    export ANSIBLE_COLLECTIONS_PATHS=${PWD}/artifacts/collections
    export ANSIBLE_ROLES_PATH=${PWD}/artifacts/roles
    export ANSIBLE_INVENTORY=${PWD}/${CLOUD_PROVIDER}/hosts.ini
  2. Install the roles required by Ansible:

    ansible-galaxy install -r ansible/requirements.yml

Configure a hosts file

Redpanda Data recommends incorporating variables into your hosts.ini file for every host. Edits made to properties outside of the playbook may be overwritten.

If you used Terraform to deploy the instances, the hosts.ini is configured automatically in the artifacts directory.

If you didn’t use Terraform, then you must manually update the [redpanda] section. When you open the file, you see something like the following:

[redpanda]
ip ansible_user=ssh_user ansible_become=True private_ip=pip id=0
ip ansible_user=ssh_user ansible_become=True private_ip=pip id=1

[monitor]
ip ansible_user=ssh_user ansible_become=True private_ip=pip id=1

Under the [redpanda] section, replace the following:

Property Description

ip

The public IP address of the machine.

ansible_user

The username for Ansible to use to SSH to the machine.

private_ip

The private IP address of the machine. This could be the same as the public IP address.

You can add additional properties to configure features like rack awareness and Tiered Storage.

The [monitor] section is only required if you want the playbook to install and configure a basic Prometheus and Grafana setup for observability. If you have a centralized monitoring setup or if you don’t require monitoring, then remove this section.

Run a playbook

Use the Ansible Collection for Redpanda to build a Redpanda cluster. The recommended Redpanda playbook enables TLS encryption and Tiered Storage.

If you prefer, you can download the modules and required roles and create your own playbook. For example, if you want to handle your own data directory, you can toggle that part off, and Redpanda ensures that the permissions are correct. If you want to generate your own security certificates, you can.

To install and start a Redpanda cluster in one command with the Redpanda playbook, run:

ansible-playbook --private-key <your-private-key> -v ansible/playbooks/provision-basic-cluster.yml
  • The private key corresponds to the public key in the distro_user SSH configuration.

  • To use your own playbook, replace provision-basic-cluster.yml with your playbook name.

  • When you use a playbook to create a cluster, you should also use the playbook for subsequent operations, like upgrades. The Ansible modules safely handle rolling upgrades, but you must comply with Redpanda version path requirements.

Custom configuration

You can specify any available Redpanda configuration value, or set of values, by passing a JSON dictionary as an Ansible extra-var. These values are spliced with the calculated configuration and only override the values that you specify. Values must be unset manually with rpk. There are two sub-dictionaries you can specify: redpanda.cluster and redpanda.node. For more information, see Cluster Configuration Properties and Node Configuration Properties.

export JSONDATA='{"cluster":{"auto_create_topics_enabled":"true"},"node":{"developer_mode":"false"}}'
ansible-playbook ansible/<playbook-name>.yml --private-key artifacts/testkey -e redpanda="${JSONDATA}"
Adding whitespace to the JSON breaks configuration merging.

Use rpk and standard Kafka tools to produce and consume from the Redpanda cluster.

Configure Prometheus and Grafana

Include a [monitor] section in your hosts file if you want the playbook to install and configure a basic Prometheus and Grafana setup for observability. Redpanda emits Prometheus metrics that can be scrapped with a central collector. If you already have a centralized monitoring setup or if you don’t require monitoring, then this is unnecessary.

To run the deploy-prometheus-grafana.yml playbook:

ansible-playbook ansible/deploy-prometheus-grafana.yml \
--private-key '<path-to-a-private-key-with-ssh-access-to-the-hosts>'

Configure Redpanda Console

To install Redpanda Console, add the redpanda_broker role to a group with install_console: true. The standard playbooks automatically install Redpanda Console on hosts in the [client] group.

Build the cluster with TLS enabled

Configure TLS with externally-provided and signed certificates. Then run the provision-tls-cluster playbook, specifying the certificate locations on new hosts. You can either pass the variables in the command line or edit the file and pass them there. Consider whether you want public access to the Kafka API and Admin API endpoints. For example:

ansible-playbook ansible/provision-tls-cluster.yml \
--private-key '<path-to-a-private-key-with-ssh-access-to-the-hosts>' \
--extra-vars create_demo_certs=false \
--extra-vars advertise_public_ips=false \
--extra-vars handle_certs=false \
--extra-vars redpanda_truststore_file='<path-to-ca.crt-file>'

It is important to use a signed certificate from a valid CA for production environments. The playbook uses locally-signed certificates that are not recommended for production use. Provide a valid certificate using these variables:

redpanda_certs_dir: /etc/redpanda/certs
redpanda_csr_file: "{{ redpanda_certs_dir }}/node.csr"
redpanda_key_file: "{{ redpanda_certs_dir }}/node.key"
redpanda_cert_file: "{{ redpanda_certs_dir }}/node.crt"
redpanda_truststore_file: "{{ redpanda_certs_dir }}/truststore.pem"

For testing, you could deploy a local CA to generate private keys and signed certificates:

ansible-playbook ansible/provision-tiered-storage-cluster.yml \
--private-key '<path-to-a-private-key-with-ssh-access-to-the-hosts>'

Add brokers to an existing cluster

To add brokers to a cluster, you must add them to the hosts file and run the relevant playbook again. You can add skip_node=true to the existing hosts to avoid the playbooks being rerun on them.

Upgrade a cluster

The playbook is designed to be idempotent, so it should be suitable for running as part of a CI/CD pipeline or through Ansible Tower. The playbook upgrades the packages and then performs a rolling upgrade, where one broker at a time is upgraded and safely restarted. For all upgrade requirements and recommendations, see Upgrade Redpanda. It is important to test that your upgrade path is safe before using it in production.

To upgrade a cluster, run the playbook with a specific target version:

ansible-playbook --private-key ~/.ssh/id_rsa ansible/<playbook-name>.yml -e redpanda_version=22.3.10-1

By default, the playbook selects the latest version of the Redpanda packages, but an upgrade is only performed if the redpanda_install_status variable is set to latest:

ansible-playbook --private-key ~/.ssh/id_rsa ansible/<playbook-name>.yml -e redpanda_install_status=latest

To upgrade clusters with SASL authentication:

export JSONDATA='{"cluster":{"auto_create_topics_enabled":"true"},"node":{"developer_mode":"false"}}'
ansible-playbook ansible/<playbook-name>.yml --private-key artifacts/testkey -e redpanda="${JSONDATA}"

Similarly, you can put the redpanda_rpk_opts into a YAML file protected with Ansible vault.

ansible-playbook --private-key ~/.ssh/id_rsa ansible/<playbook-name>.yml --extra-vars=redpanda_install_status=latest --extra-vars @vault-file.yml --ask-vault-pass

Redpanda Ansible Collection values

You can pass the following variables as -e var=value when running Ansible:

Property Default value Description

redpanda_organization

redpanda-test

Set this to identify your organization in the asset management system.

redpanda_cluster_id

redpanda

This helps identify the cluster.

advertise_public_ips

false

Configure Redpanda to advertise the broker’s public IPs for client communication instead of private IPs. This enables using the cluster from outside its subnet.

Note: This is not recommended for production deployments, because your brokers will be public.

grafana_admin_pass

<your-secure-password>

Grafana admin user’s password.

ephemeral_disk

false

Enable filesystem check for attached disk.

This is useful when using attached disks in instances with ephemeral operating system disks like Azure L Series. This allows a filesystem repair at boot time and ensures that the drive is remounted automatically after a reboot.

redpanda_mode

production

Enables hardware optimization.

redpanda_admin_api_port

9644

redpanda_kafka_port

9092

redpanda_rpc_port

33145

redpanda_schema_registry_port

8081

is_using_unstable

false

Enables access to unstable builds.

redpanda_version

latest

Version; for example, 22.2.2-1 or 22.3.1~rc1-1. If this value is set, then the package is upgraded if the installed version is lower than what has been specified.

redpanda_rpk_opts

Command line options to be passed to instances where rpk is used on the playbook. For example, superuser credentials can be specified as --user myuser --password mypassword.

redpanda_install_status

present

If redpanda_version is set to latest, then changing redpanda_install_status to latest causes an upgrade; otherwise, the currently-installed version remains.

redpanda_data_directory

/var/lib/redpanda/data

Path where Redpanda keeps its data.

redpanda_key_file

/etc/redpanda/certs/node.key

TLS: Path to private key.

redpanda_cert_file

/etc/redpanda/certs/node.crt

TLS: Path to signed certificate.

redpanda_truststore_file

/etc/redpanda/certs/truststore.pem

TLS: Path to truststore.

tls

false

Set to true to configure Redpanda to use TLS. This can be set on each broker, although this may lead to errors configuring rpk.

skip_node

false

Node configuration to prevent the redpanda_broker role being applied to this specific broker. Use carefully when adding new brokers to avoid existing brokers from being reconfigured.

restart_node

false

Node configuration to prevent Redpanda brokers from being restarted after updating. Use with care: This can cause rpk to be reconfigured but the broker is not restarted and therefore is in an inconsistent state.

rack

undefined

Node configuration to enable rack awareness. Rack awareness is enabled cluster-wide if at least one broker has this set.

tiered_storage_bucket_name

Set bucket name to enable Tiered Storage.

schema_registry_replication_factor

1

The replication factor of Schema Registry’s internal storage topic.

aws_region

The region to be used if Tiered Storage is enabled.

Troubleshooting

On Mac OS X, Python may be unable to fork workers. You may see something like the following:

ok: [34.209.26.177] => {“changed”: false, “stat”: {“exists”: false}}
objc[57889]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called.
objc[57889]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
ERROR! A worker was found in a dead state

Try setting an environment variable to resolve the error:

export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES