Embedded cluster management
StackState Self-hosted v5.0.x

Overview

This page describes a number of advanced cluster administration scenarios.

Longhorn

Storage within a kURL embedded cluster is managed by Longhorn (kurl.sh). Longhorn (longhorn.io) is a storage manager originally developed by Rancher and now a CNCF incubating project. By default, Longhorn will keep 3 replicas of each volume within the cluster so that the failure of a single node does not cause data to be lost. For the backend services of StackState, we've configured Longhorn to use 2 replicas for each volume to reduce disk space requirements.

Access the Longhorn UI

To manage the Longhorn storage cluster, expose the Longhorn UI as follows:
  1. 1.
    From your local machine, SSH into the first node and start a port-forward to an open port on your local machine:
    1
    ssh -L 30881:localhost:30881 [email protected]
    Copied!
  2. 2.
    Start the Longhorn UI by scaling up its deployment:
    1
    kubectl scale -n longhorn-system deployment longhorn-ui --replicas=1
    Copied!
  3. 3.
    Start a port-forward to the longhorn-frontend service using the port number chosen in step 1, 30881 in this case:
    1
    kubectl port-forward -n longhorn-system service/longhorn-frontend 30881:80
    Copied!
  4. 4.
    On your local machine, connect to the Longhorn UI at http://localhost:30881/​
The UI should look something like this:
Longhorn UI dashboard

Stop the Longhorn UI

To stop the Longhorn UI, scale down its deployment:
1
kubectl scale -n longhorn-system deployment longhorn-ui --replicas=0
Copied!

Use kubectl to get Longhorn information

The kubectl command can be used to get more information about the Longhorn storage system. For example:
  • View all nodes:
    1
    kubectl get -n longhorn-system nodes.v1beta1.longhorn.io
    Copied!
  • View detailed information for a node:
    1
    kubectl describe -n longhorn-system node.v1beta1.longhorn.io NODENAME
    Copied!
  • View all volumes:
    1
    kubectl get -n longhorn-system volumes.v1beta1.longhorn.io
    Copied!
  • View all replicas:
    1
    kubectl get -n longhorn-system replicas.v1beta1.longhorn.io
    Copied!
It is also possible to change settings with kubectl. For example, the following script can be used to set the reserved space for all nodes to 10Gi:
1
#!/usr/bin/env bash
2
set -Exo pipefail
3
​
4
NODE_TYPE="nodes.v1beta1.longhorn.io"
5
LONGHORN_NAMESPACE="longhorn-system"
6
STORAGE_RESERVED=10737418240 # 10 Gi
7
​
8
NODES=$(kubectl get ${NODE_TYPE} -n ${LONGHORN_NAMESPACE} -o name)
9
​
10
kubectl patch -n ${LONGHORN_NAMESPACE} ${NODES} --type='json' -p "[{\"op\": \"replace\", \"path\": \"/spec/disks/default-disk-ca1000000000/storageReserved\", \"value\": ${STORAGE_RESERVED}}]"
Copied!

Change the number of replicas for Longhorn persistent volumes

  1. 1.
    Scale down all the Stackstate deployments and delete all statefulsets. The deletion is required since the immutable fields of the statefulsets have to be changed. The data stored on the persistent volumes won't be deleted:
1
kubectl scale -n default deployment -l kots.io/app-slug=stackstate --replicas=0
2
kubectl delete -n default statefulsets -l kots.io/app-slug=stackstate
Copied!
  1. 1.
    Select the required storage class in the Persistent volume settings section of the KOTS Config UI.
  2. 2.
    Redeploy the application via the KOTS Config UI.
  3. 3.
    Change the number of replicas for the following persistent volumes on the Volume tab of the Longhorn UI:
    • 1 - for the longhorn-single-replica storage class.
    • 2 - for the longhorn-stackstate (two-replicas volumes)

Node management

Restart a node

To restart a node:
  1. 1.
    Log in to the node
  2. 2.
    Run the following command to drain the node. Note that this can cause the StackState application to not function correctly while pods are rescheduled on other nodes:
    1
    sudo /opt/ekco/shutdown.sh
    Copied!
  3. 3.
    Restart the node

Add a node

When the embedded Kubernetes cluster is installed on the first node, two commands are generated at the end of the installation procedure:
  • Use the first command to join a new machine as a worker node (valid for 24 hours).
  • Use the second command to join a new machine as another master node (valid for 2 hours).
If those commands have expired:
  1. 1.
    Log in to the KOTS console.
  2. 2.
    Select Cluster Management in the top bar.
  3. 3.
    Scroll down to the Add a Node section.
  4. 4.
    Select either Primary Node (for a master node) or Secondary Node (for a worker node).
  5. 5.
    Run the generated command on the machine you wish to join the cluster.
More information about adding a node can be found in the kURL documentation on adding nodes (kurl.sh).

Remove a node

  • To prevent data loss, ensure that the data has been replicated to another node before removing a node from the cluster.
  • Master nodes cannot be removed safely. The kurl documentation on etcd cluster health (kurl.sh) mentions that it is important to maintain quorum, however, StackState has so far been unable to remove a master node without breaking the cluster.
To remove a node, follow these steps:
  1. 1.
    ​Add another node so that the data that will be removed from this node can be replicated somewhere.
  2. 3.
    In the Longhorn UI, navigate to the Node tab.
  3. 4.
    Select the node you want to remove.
  4. 5.
    Click the Edit Node button at the top of the list.
  5. 6.
    In the modal dialog box, select Disable under Node Scheduling and True under Eviction Requested:
    ​
  6. 7.
    Click the Save button.
  7. 8.
    The Node tab should now show that the new node is being assigned volume replicas:
    ​
  8. 9.
    Navigate to the Volume tab to see the volumes being copied over:
    ​
  9. 10.
    Navigate back to the Node tab and wait until the node to remove has no more replicas assigned to it:
    ​
  10. 11.
    Log in to the node to be removed.
  11. 12.
    Run the following command to drain the node. Note that this can cause the StackState application to not function correctly while pods are rescheduled on other nodes:
    1
    sudo /opt/ekco/shutdown.sh
    Copied!
  12. 13.
    Stop the node.
  13. 14.
    On one of the master nodes, run the following command to completely remove the stopped node from the cluster:
    1
    ekco-purge-node NODENAME
    Copied!
For more information about removing a node, see the kurl and Longhorn documentation: