Post

Deploying Rancher on a 3-Node High Availability RKE2 Kubernetes Cluster

A comprehensive guide to setting up Rancher on a highly available RKE2 Kubernetes cluster with Cilium CNI and Kube-vip for robust container management

Deploying Rancher on a 3-Node High Availability RKE2 Kubernetes Cluster

Introduction

Rancher is a complete container management platform that makes it easy to deploy, manage, and scale containerized applications. In this post, I’ll walk through the process of deploying Rancher on a highly available (HA) 3-node Kubernetes cluster using Rancher Kubernetes Engine 2 (RKE2), with Cilium as the CNI and Kube-vip for control plane load balancing.

What is RKE2?

RKE2, also known as “RKE Government,” is SUSE/Rancher’s next-generation Kubernetes distribution. It combines the best features of RKE (Rancher Kubernetes Engine) with the security-focused approach of K3s, creating a fully conformant Kubernetes distribution with enhanced security features. RKE2 was designed to meet the needs of security-conscious organizations, particularly those in government and regulated sectors.

Key features of RKE2 include:

  • FIPS 140-2 Compliance: Built to satisfy Federal Information Processing Standards for secure computing.
  • CIS Hardening: Optimized to meet the Center for Internet Security’s benchmarks out-of-the-box.
  • Enhanced Security Posture: Minimized attack surface through elimination of unnecessary packages and services.
  • Simpler Installation: Similar to K3s, with a single binary installation process.
  • Robust Architecture: Uses containerd as its container runtime instead of Docker.
  • Automated Certificate Rotation: For enhanced security over time.
  • Enterprise Support: Being a SUSE/Rancher product, it comes with enterprise support options.

This setup provides:

  • High availability with multiple control plane nodes
  • Modern networking with Cilium CNI (replacing kube-proxy)
  • Load balancing of the control plane API using Kube-vip
  • A robust platform for managing multiple Kubernetes clusters with Rancher

Prerequisites

  • Three servers for the RKE2 nodes with Ubuntu/Debian (I’m using Debian 11)
  • Static IP addresses for each server
  • A floating/virtual IP for the API endpoint
  • DNS record for Rancher UI (optional but recommended)
  • SSH access to all nodes

Architecture Overview

Our setup consists of:

  • 3 nodes running RKE2 server (control plane). Nodes are schedulable for workloads.
  • A virtual IP managed by Kube-vip (192.168.203.40)
  • Cilium as the CNI, replacing kube-proxy
  • Rancher management platform deployed on the cluster

Step 1: Setting Up Environment Variables

First, we’ll set up the environment variables on each node. These define our node IPs, API VIP, and other configuration parameters:

1
2
3
4
5
6
7
export RKE2_API_VIP=192.168.203.40
export RKE2_NODE_0_IP=192.168.203.41
export RKE2_NODE_1_IP=192.168.203.42
export RKE2_NODE_2_IP=192.168.203.43
export NODE_JOIN_TOKEN="RancherRancherer"  # Use a secure token in production
export INTERFACE=eth0
export KUBE_VIP_VERSION=v0.8.9

Step 2: Configuring the First Control Plane Node

The first node requires special configuration as it initializes the cluster. Let’s create the necessary directories and configuration files:

1
2
mkdir -p /etc/rancher/rke2
mkdir -p /var/lib/rancher/rke2/server/manifests/

Creating RKE2 Configuration

Create the RKE2 configuration file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
cat <<EOF | tee /etc/rancher/rke2/config.yaml
token: ${NODE_JOIN_TOKEN}
tls-san:
- ${HOSTNAME}
- ${RKE2_API_VIP}
- ${RKE2_NODE_0_IP}
- ${RKE2_NODE_1_IP}
- ${RKE2_NODE_2_IP}
write-kubeconfig-mode: 600
etcd-expose-metrics: true
disable-kube-proxy: true
cni:
- cilium
disable:
- rke2-ingress-nginx
EOF

Configuring Cilium

Create a Cilium configuration that will replace kube-proxy:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
cat <<EOF | tee /var/lib/rancher/rke2/server/manifests/rke2-cilium-config.yaml
---
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rke2-cilium
  namespace: kube-system
spec:
  valuesContent: |-
    kubeProxyReplacement: true
    k8sServiceHost: "localhost"
    k8sServicePort: "6443"
    hubble:
      enabled: true
      relay:
        enabled: true
      ui:
        enabled: true
EOF

Step 3: Installing RKE2 on the First Node

Now let’s install and start RKE2 on the first node:

1
2
3
4
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE=server sh -

systemctl enable rke2-server.service
systemctl start rke2-server.service

Monitor the installation progress:

1
journalctl -u rke2-server -f

Verify that the service is running:

1
systemctl status rke2-server

Step 4: Setting Up Environment for Kubernetes Interaction

Once RKE2 is running, set up the environment to interact with the cluster:

1
2
3
4
export PATH=$PATH:/var/lib/rancher/rke2/bin
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
export CONTAINER_RUNTIME_ENDPOINT=unix:///run/k3s/containerd/containerd.sock
export CONTAINERD_ADDRESS=/run/k3s/containerd/containerd.sock

Verify that the node is up and running:

1
kubectl get nodes -o wide

Step 5: Setting Up Kube-vip for Control Plane HA

To enable high availability for the control plane API endpoint, we’ll deploy Kube-vip:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Deploy RBAC configuration
curl https://kube-vip.io/manifests/rbac.yaml > /var/lib/rancher/rke2/server/manifests/kube-vip-rbac.yaml

# Pull Kube-vip image
crictl pull docker.io/plndr/kube-vip:$KUBE_VIP_VERSION

# Create an alias for the kube-vip command
alias kube-vip="ctr --namespace k8s.io run --rm --net-host docker.io/plndr/kube-vip:$KUBE_VIP_VERSION vip /kube-vip"

# Generate and deploy Kube-vip DaemonSet manifest
kube-vip manifest daemonset \
 --arp \
 --interface $INTERFACE \
 --address $RKE2_API_VIP \
 --controlplane \
 --leaderElection \
 --taint \
 --inCluster | tee /var/lib/rancher/rke2/server/manifests/kube-vip.yaml

Step 6: Join Additional Control Plane Nodes

On the second and third nodes, create the configuration file that points to the API VIP:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
mkdir -p /etc/rancher/rke2

cat <<EOF | tee /etc/rancher/rke2/config.yaml
server: https://${RKE2_API_VIP}:9345
token: ${NODE_JOIN_TOKEN}
tls-san:
- ${HOSTNAME}
- ${RKE2_API_VIP}
- ${RKE2_NODE_0_IP}
- ${RKE2_NODE_1_IP}
- ${RKE2_NODE_2_IP}
write-kubeconfig-mode: 600
etcd-expose-metrics: true
disable-kube-proxy: true
cni:
- cilium
disable:
- rke2-ingress-nginx
EOF

Then install and start RKE2 on each additional node:

1
2
3
4
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE=server sh -

systemctl enable rke2-server.service
systemctl start rke2-server.service

Monitor the installation and verify that all nodes join the cluster:

1
2
# On the first node
kubectl get nodes -o wide

Step 7: Installing Rancher

Now that our HA Kubernetes cluster is running, let’s deploy Rancher. First, add the Rancher Helm repository:

1
2
helm repo add rancher-stable https://releases.rancher.com/server-charts/stable
helm repo update

Create the cattle-system namespace:

1
kubectl create namespace cattle-system

Installing Cert-Manager (if not already installed)

Rancher requires cert-manager for certificate management:

1
2
3
4
5
6
7
8
9
10
11
# Add the Jetstack Helm repository
helm repo add jetstack https://charts.jetstack.io

# Update your local Helm chart repository cache
helm repo update

# Install the cert-manager Helm chart
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --set crds.enabled=true

Installing Rancher

When installing Rancher, you have three options for handling TLS certificates for the Rancher UI:

Option 1: Rancher-generated Certificates

The default is for Rancher to generate a CA and uses cert-manager to issue the certificate for access to the Rancher server interface.

1
2
3
4
helm install rancher rancher-stable/rancher \
  --namespace cattle-system \
  --set hostname=rancher.plutolab.live \
  --set bootstrapPassword=admin 

Option 2: Installing Rancher with Let’s Encrypt

This option uses cert-manager to automatically request and renew Let’s Encrypt certificates. This is a free service that provides you with a valid certificate as Let’s Encrypt is a trusted CA.

1
2
3
4
5
6
7
helm install rancher rancher-stable/rancher \
  --namespace cattle-system \
  --set hostname=rancher.my.org \
  --set bootstrapPassword=admin \
  --set ingress.tls.source=letsEncrypt \
  --set letsEncrypt.email=[email protected] \
  --set letsEncrypt.ingress.class=nginx
You need to have port 80 open as the HTTP-01 challenge can only be done on port 80.

If you want to use the DNS-01 challenge instead, you need to set up a cluster issuer for Let’s Encrypt. Here’s an example of how to create a cluster issuer using DNS-01 challenge with Cloudflare:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-dns
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: [email protected]
    privateKeySecretRef:
      name: letsencrypt-dns
    solvers:
    - dns01:
        cloudflare:
          email: [email protected]
          apiKeySecretRef:
            name: cloudflare-api-key
            key: api-key

Once you have the cluster issuer set up, you can install Rancher with the following command:

1
2
3
4
5
6
7
helm install rancher rancher-stable/rancher \
  --namespace cattle-system \
  --set hostname=rancher.my.org \
  --set bootstrapPassword=admin \
  --set ingress.tls.source=secret \
  --set ingress.extraAnnotations.'cert-manager\.io/cluster-issuer'=letsencrypt-dns

Option 3: Installing Rancher with Your Own Certificates

If you have your own certificates, create a secret first:

1
2
3
4
5
6
7
8
9
10
11
kubectl -n cattle-system create secret tls tls-rancher-ingress \
  --cert=/path/to/your/cert.pem \
  --key=/path/to/your/key.pem

# Then install Rancher
helm install rancher rancher-stable/rancher \
  --namespace cattle-system \
  --set hostname=rancher.my.org \
  --set bootstrapPassword=admin \
  --set ingress.tls.source=secret \
  --set privateCA=true

For this demo, I’m using option 2 with Let’s Encrypt DNS-01 challenge, but you can choose any of the options based on your requirements.

Accessing Rancher

Once the installation completes, you can access Rancher at https://rancher.my.org (or your configured hostname). The initial password is set to “admin” as specified in the Helm installation.

Verifying the Installation

Check that all Rancher pods are running correctly:

1
kubectl -n cattle-system get pods

Conclusion

You now have a highly available Rancher deployment running on a 3-node RKE2 cluster with Cilium CNI and Kube-vip for control plane high availability. This setup provides a robust foundation for managing multiple Kubernetes clusters across your organization.

The use of Cilium as a CNI with kube-proxy replacement offers enhanced networking capabilities, while Kube-vip ensures that your control plane remains accessible even if one of the nodes fails.

Rancher provides a comprehensive UI and API for managing all aspects of your Kubernetes clusters, from workload deployments to user management and access control. With this setup, you can easily deploy and manage multiple Kubernetes clusters at scale.

Have you deployed Rancher in your environment? Let me know in the comments how you’ve configured your setup and any challenges you encountered!

This post is licensed under CC BY 4.0 by the author.