This page describes a number of advanced cluster administration scenarios.
Longhorn
Storage within a kURL embedded cluster is managed by Longhorn (kurl.sh). Longhorn (longhorn.io) is a storage manager originally developed by Rancher and now a CNCF incubating project. By default, Longhorn will keep 3 replicas of each volume within the cluster so that the failure of a single node does not cause data to be lost. For the backend services of StackState, we've configured Longhorn to use 2 replicas for each volume to reduce disk space requirements.
Access the Longhorn UI
To manage the Longhorn storage cluster, expose the Longhorn UI as follows:
1.
From your local machine, SSH into the first node and start a port-forward to an open port on your local machine:
Change the number of replicas for Longhorn persistent volumes
1.
Scale down all the Stackstate deployments and delete all statefulsets. The deletion is required since the immutable fields of the statefulsets have to be changed. The data stored on the persistent volumes won't be deleted:
Change the number of replicas for the following persistent volumes on the Volume tab of the Longhorn UI:
1 - for the longhorn-single-replica storage class.
2 - for the longhorn-stackstate (two-replicas volumes)
Node management
Restart a node
To restart a node:
1.
Log in to the node
2.
Run the following command to drain the node. Note that this can cause the StackState application to not function correctly while pods are rescheduled on other nodes:
1
sudo /opt/ekco/shutdown.sh
Copied!
3.
Restart the node
Add a node
When the embedded Kubernetes cluster is installed on the first node, two commands are generated at the end of the installation procedure:
Use the first command to join a new machine as a worker node (valid for 24 hours).
Use the second command to join a new machine as another master node (valid for 2 hours).
If those commands have expired:
1.
Log in to the KOTS console.
2.
Select Cluster Management in the top bar.
3.
Scroll down to the Add a Node section.
4.
Select either Primary Node (for a master node) or Secondary Node (for a worker node).
5.
Run the generated command on the machine you wish to join the cluster.
More information about adding a node can be found in the kURL documentation on adding nodes (kurl.sh).
Remove a node
To prevent data loss, ensure that the data has been replicated to another node before removing a node from the cluster.
Master nodes cannot be removed safely. The kurl documentation on etcd cluster health (kurl.sh) mentions that it is important to maintain quorum, however, StackState has so far been unable to remove a master node without breaking the cluster.
To remove a node, follow these steps:
1.
Add another node so that the data that will be removed from this node can be replicated somewhere.
Click the Edit Node button at the top of the list.
6.
In the modal dialog box, select Disable under Node Scheduling and True under Eviction Requested:
7.
Click the Save button.
8.
The Node tab should now show that the new node is being assigned volume replicas:
9.
Navigate to the Volume tab to see the volumes being copied over:
10.
Navigate back to the Node tab and wait until the node to remove has no more replicas assigned to it:
11.
Log in to the node to be removed.
12.
Run the following command to drain the node. Note that this can cause the StackState application to not function correctly while pods are rescheduled on other nodes:
1
sudo /opt/ekco/shutdown.sh
Copied!
13.
Stop the node.
14.
On one of the master nodes, run the following command to completely remove the stopped node from the cluster:
1
ekco-purge-node NODENAME
Copied!
For more information about removing a node, see the kurl and Longhorn documentation: