Agent V2 on Kubernetes

StackState SaaS

Overview

StackState Agent V2

To retrieve topology, events and metrics data from a Kubernetes or OpenShift cluster, you will need to have the following installed in the cluster:

  • StackState Agent V2 on each node in the cluster

  • StackState Cluster Agent on one node

  • StackState Checks Agent on one node

  • kube-state-metrics

To integrate with other services, a separate instance of StackState Agent V2 should be deployed on a standalone VM.

StackState Agent types

The Kubernetes and OpenShift integrations collect topology data from Kubernetes and OpenShift clusters respectively, as well as metrics and events. To achieve this, different types of StackState Agent are used:

To integrate with other services, a separate instance of the StackState Agent should be deployed on a standalone VM. It is not currently possible to configure a StackState Agent deployed on a Kubernetes or OpenShift cluster with checks that integrate with other services.

Agent

StackState Agent V2 is deployed as a DaemonSet with one instance on each node in the cluster:

  • Host information is retrieved from the Kubernetes or OpenShift API.

  • Container information is collected from the Docker daemon.

  • Metrics are retrieved from kubelet running on the node and also from kube-state-metrics if this is deployed on the same node.

Checks Agent

The StackState Checks Agent is an additional StackState Agent V2 pod that will run the cluster checks that are configured on the StackState Cluster Agent.

The following checks can be configured to run as a cluster check:

Cluster Agent

StackState Cluster Agent is deployed as a Deployment. There is one instance for the entire cluster:

  • Topology and events data for all resources in the cluster are retrieved from the Kubernetes API

  • Control plane metrics are retrieved from the Kubernetes or OpenShift API

Cluster checks configured here are run by the deployed StackState Checks Agent pod.

Setup

Supported Kubernetes versions

StackState Agent v2.18.x is supported to monitor the following versions of Kubernetes or OpenShift:

  • Kubernetes:

    • Kubernetes 1.16 - 1.21

    • EKS (with Kubernetes 1.16 - 1.21)

  • OpenShift:

    • OpenShift 4.3 - 4.8

  • Default networking

  • Container runtime:

    • Docker

    • containerd

    • CRI-O

Install

The StackState Agent, Cluster Agent, Checks Agent and kube-state-metrics can be installed together using the StackState Agent Helm Chart:

  • Online install - charts are retrieved from the default StackState chart repository (https://helm.stackstate.io), images are retrieved from the default StackState image registry (quay.io).

  • Air gapped install - images are retrieved from a local system or registry.

  • Install from a custom image registry - images are retrieved from a configured image registry.

Online install

The StackState Agent, Cluster Agent, Checks Agent and kube-state-metrics can be installed together using the StackState Agent Helm Chart:

  1. If you do not already have it, you will need to add the StackState helm repository to the local helm client:

     helm repo add stackstate https://helm.stackstate.io
     helm repo update
  2. Deploy the StackState Agent, Cluster Agent, Checks Agent and kube-state-metrics to namespace stackstate using the helm command below.

helm upgrade --install \
   --namespace stackstate \
   --create-namespace \
   --set-string 'stackstate.apiKey'='<STACKSTATE_RECEIVER_API_KEY>' \
   --set-string 'stackstate.cluster.name'='<KUBERNETES_CLUSTER_NAME>' \
   --set-string 'stackstate.url'='<STACKSTATE_RECEIVER_API_ADDRESS>' \
   stackstate-agent stackstate/stackstate-agent

Air gapped install

If StackState Agent will run in an environment that does not have a direct connection to the Internet, the images required to install the StackState Agent, Cluster Agent, Checks Agent and kube-state-metrics can be downloaded and stored in a local system or image registry.

  1. Internet connection required:

    1. Download or clone the StackState Helm charts repo from GitHub: https://github.com/StackVista/helm-charts

    2. In the Helm charts repo, go to the directory stable/stackstate-agent/installation and use the script backup.sh to back up the required images from StackState. The script will pull all images required for the stackstate-agent Helm chart to run, back them up to individual tar archives and add all tars to a single tar.gz archive. The images will be in a tar.gz archive in the same folder as the working directory from where the script was executed. It is advised to run the script from the stable/stackstate-agent/installation directory as this will simplify the process of importing images on the destination system.

      • By default, the backup script will retrieve charts from the StackState chart repository (https://helm.stackstate.io), images are retrieved from the default StackState image registry (quay.io). The script can be executed from the installation directory as simply ./backup.sh.

          Back up helm chart images to a tar.gz archive for easy transport via an external storage device.
        
          Arguments:
              -c : Helm chart (default: stackstate/stackstate-agent)
              -h : Show this help text
              -r : Helm repository (default: https://helm.stackstate.io)
              -t : Dry-run
      • Add the -t (dry-run) parameter to the script to give a predictive output of what work will be performed, for example:

        ./backup.sh -t
        Backing up quay.io/stackstate/stackstate-agent-2:2.18.0 to stackstate/stackstate-agent-2__2.18.0.tar (dry-run)
        Backing up quay.io/stackstate/stackstate-process-agent:4.0.7 to stackstate/stackstate-process-agent__4.0.7.tar (dry-run)
        Backing up quay.io/stackstate/kube-state-metrics:2.3.0-focal-20220316-r61.20220418.2032 to stackstate/kube-state-metrics__2.3.0-focal-20220316-r61.20220418.2032.tar (dry-run)
        Backing up quay.io/stackstate/stackstate-agent-cluster-agent:2.18.0 to stackstate/stackstate-agent-cluster-agent__2.18.0.tar (dry-run)
        Backing up quay.io/stackstate/stackstate-agent-2:2.18.0 to stackstate/stackstate-agent-2__2.18.0.tar (dry-run)
        Images have been backed up to stackstate.tar.gz
  2. No internet connection required:

    1. Transport images to the destination system.

      • Copy the StackState Helm charts repo, including the tar.gz generated by the backup script, to a storage device for transportation. If the backup script was run from the stable/stackstate-agent/installation directory as advised, the tar.gz will be located at stable/stackstate-agent/installation/stackstate.tar.gz.

      • Copy the Helm charts repo and tar.gz from the storage device to a working folder of choice on the destination system.

    2. Import images to the system, and optionally push to a registry.

      • On the destination system, go to the directory in the StackState Helm charts repo that contains both the scripts and the generated tar.gz archive. By default, this will be stable/stackstate-agent/installation.

      • Execute the import.sh script. Note that the import script must be located in the same directory as the tar.gz archive to be imported, the following must be specified:

        • -b - path to the tar.gz to be imported

        • -d - the destination Docker image registry

      • Additional options when running the script:

        • -p - push images to the destination registry. When not specified, images will be imported and tagged, but remain on the local machine.

        • -t - Dry-run. Use to show the work that will be performed without any action being taken.

Example script usage

In the example below, the StackState Agent images will be extracted from the archive stackstate.tar.gz, imported by Docker, and re-tagged to the registry given by the -d flag, in this example, localhost. The -t argument (dry-run) is provided to show the work that will be performed:

./import.sh -b stackstate.tar.gz -d localhost -t

Unzipping archive stackstate.tar.gz
x stackstate/
x stackstate/stackstate-process-agent__4.0.7.tar
x stackstate/stackstate-agent-2__2.18.0.tar
x stackstate/kube-state-metrics__2.3.0-focal-20220316-r61.20220418.2032.tar
x stackstate/stackstate-agent-cluster-agent__2.18.0.tar
Restoring stackstate/kube-state-metrics:2.3.0-focal-20220316-r61.20220418.2032 from kube-state-metrics__2.3.0-focal-20220316-r61.20220418.2032.tar (dry-run)
Imported quay.io/stackstate/kube-state-metrics:2.3.0-focal-20220316-r61.20220418.2032
Tagged quay.io/stackstate/kube-state-metrics:2.3.0-focal-20220316-r61.20220418.2032 as localhost/stackstate/kube-state-metrics:2.3.0-focal-20220316-r61.20220418.2032
Untagged: quay.io/stackstate/kube-state-metrics:2.3.0-focal-20220316-r61.20220418.2032
Restoring stackstate/stackstate-agent-2:2.18.0 from stackstate-agent-2__2.18.0.tar (dry-run)
Imported quay.io/stackstate/stackstate-agent-2:2.18.0
Tagged quay.io/stackstate/stackstate-agent-2:2.18.0 as localhost/stackstate/stackstate-agent-2:2.18.0
Untagged: quay.io/stackstate/stackstate-agent-2:2.18.0
Restoring stackstate/stackstate-agent-cluster-agent:2.18.0 from stackstate-cluster-agent__2.18.0.tar (dry-run)
Imported quay.io/stackstate/stackstate-agent-cluster-agent:2.18.0
Tagged quay.io/stackstate/stackstate-agent-cluster-agent:2.18.0 as localhost/stackstate/stackstate-cluster-agent:2.18.0
Untagged: quay.io/stackstate/stackstate-agent-cluster-agent:2.18.0
Restoring stackstate/stackstate-process-agent:4.0.7 from stackstate-process-agent__4.0.7.tar (dry-run)
Imported quay.io/stackstate/stackstate-process-agent:4.0.7
Tagged quay.io/stackstate/stackstate-process-agent:4.0.7 as localhost/stackstate/stackstate-process-agent:4.0.7
Untagged: quay.io/stackstate/stackstate-process-agent:4.0.7
Images have been imported up to localhost

Install from a custom image registry

If required, the images required to install the StackState Agent, Cluster Agent, Checks Agent and kube-state-metrics can be served from a custom image registry. To do this, follow the instructions to install from a custom image registry.

Helm chart values

Additional variables can be added to the standard helm command used to deploy the StackState Agent, Cluster Agent, Checks Agent and kube-state-metrics. For example:

Details of all available helm chart values can be found in the Cluster Agent Helm Chart documentation (github.com).

stackstate.cluster.authToken

It is recommended to provide a stackstate.cluster.authToken in addition to the standard helm chart variables when the StackState Agent is deployed. This is an optional variable, however, if not provided a new, random value will be generated each time a helm upgrade is performed. This could leave some pods in the cluster with an incorrect configuration.

For example:

helm upgrade --install \
  --namespace stackstate \
  --create-namespace \
  --set-string 'stackstate.apiKey'='<STACKSTATE_RECEIVER_API_KEY>' \
  --set-string 'stackstate.cluster.name'='<KUBERNETES_CLUSTER_NAME>' \
  --set-string 'stackstate.cluster.authToken'='<CLUSTER_AUTH_TOKEN>' \
  --set-string 'stackstate.url'='<STACKSTATE_RECEIVER_API_ADDRESS>' \
  stackstate-agent stackstate/stackstate-agent

agent.containerRuntime.customSocketPath

It is not necessary to configure this property if your cluster uses one of the default socket paths (/var/run/docker.sock, /var/run/containerd/containerd.sock or /var/run/crio/crio.sock)

If your cluster uses a custom socket path, you can provide it using the key agent.containerRuntime.customSocketPath. For example:

helm upgrade --install \
--namespace stackstate \
--create-namespace \
--set-string 'stackstate.apiKey'='<STACKSTATE_RECEIVER_API_KEY>' \
--set-string 'stackstate.cluster.name'='<KUBERNETES_CLUSTER_NAME>' \
--set-string 'stackstate.url'='<STACKSTATE_RECEIVER_API_ADDRESS>' \
--set-string 'agent.containerRuntime.customSocketPath'='<CUSTOM_SOCKET_PATH>' \
stackstate-agent stackstate/stackstate-agent

Upgrade

Upgrade Agents

To upgrade the Agents running in your Kubernetes or OpenShift cluster, follow the steps described below.

Redeploy/upgrade Agents with the new stackstate/stackstate-agent chart

The new stackstate/stackstate-agent chart can be used to deploy any version of the Agent. Note that the naming of some values has changed compared to the old stackstate/cluster-agent chart.

  • If this is the first time you will use the new stackstate/stackstate-agent chart to deploy the Agent, follow the instructions to upgrade the Helm chart.

  • If you previously deployed the Agent using the new stackstate/stackstate-agent, you can upgrade/redeploy the Agent using the same command used to initially deploy the Agent.

helm upgrade --install \
  --namespace stackstate \
  --create-namespace \
  --set-string 'stackstate.apiKey'='<STACKSTATE_RECEIVER_API_KEY>' \
  --set-string 'stackstate.cluster.name'='<KUBERNETES_CLUSTER_NAME>' \
  --set-string 'stackstate.cluster.authToken'='<CLUSTER_AUTH_TOKEN>' \
  --set-string 'stackstate.url'='<STACKSTATE_RECEIVER_API_ADDRESS>' \
  --values values.yaml \
  stackstate-agent stackstate/stackstate-agent

Upgrade Helm chart

The stackstate/cluster-agent chart is being deprecated and will no longer be supported. It has been replaced by the new stackstate/stackstate-agent chart.

The naming of some values has changed in the new chart. If you previously deployed the Agent using the stackstate/cluster-agent, follow the steps below to update the values.yaml file and redeploy the Agent with the new stackstate/stackstate-agent chart:

  1. Backup the values.yaml file that was used to deploy with the old stackstate/cluster-agent chart.

  2. Copy of the values.yaml file and update the following values in the new file. This will allow you to re-use the previous values while ensuring compatibility with the new chart:

    • clusterChecks has been renamed to checksAgent - the checksAgent now runs by default. The checksAgent section is now only required if you want to disable the Checks Agent.

    • agent has been renamed to nodeAgent.

    • The kubernetes_state check now runs in the Checks Agent by default, this no longer needs to be configured on default installations.

    • For an example of the changes required to the values.yaml file, see the comparison - OLD values.yaml and NEW values.yaml

  3. Uninstall the StackState Cluster Agent and the StackState Agent from your Kubernetes or OpenShift cluster, using a Helm uninstall:

    helm uninstall <release_name> --namespace <namespace>
    
    # If you used the standard install command provided when you installed the StackPack
    helm uninstall stackstate-agent --namespace stackstate
  4. Redeploy the cluster_agent using the updated values.yaml file created in step 2 and the new stackstate/stackstate-agent chart:

helm upgrade --install \
  --namespace stackstate \
  --create-namespace \
  --set-string 'stackstate.apiKey'='<STACKSTATE_RECEIVER_API_KEY>' \
  --set-string 'stackstate.cluster.name'='<KUBERNETES_CLUSTER_NAME>' \
  --set-string 'stackstate.cluster.authToken'='<CLUSTER_AUTH_TOKEN>' \
  --set-string 'stackstate.url'='<STACKSTATE_RECEIVER_API_ADDRESS>' \
  --values values.yaml \
  stackstate-agent stackstate/stackstate-agent

Comparison - OLD values.yaml and NEW values.yaml

The old stackstate/cluster-agent chart used to be the Agent has been replaced by the new stackstate/stackstate-agent chart. The naming of some values has changed in the new chart. If you were previously deploying the Agent with the old stackstate/cluster-agent and a values.yaml file, you should update your values.yaml to match the new naming.

In addition to these changes, the kubernetes_state check runs by default in the Checks Agent when using the new chart (stackstate/stackstate-agent). This no longer needs to be configured on default installations.

Below is an example comparing the values.yaml required by the new chart (stackstate/stackstate-agent) and the old chart (stackstate/cluster-agent) to deploy the Agent with the following configuration:

  • Checks Agent enabled

  • kubernetes_state check running in the Checks Agent

  • AWS check running in the Checks Agent

# checksAgent enabled by default
# (Called clusterChecks in the old stackstate/cluster-agent chart)
# kubernetes_state check disabled by default on regular Agent pods.
clusterAgent:
 config:
   override:
   # kubernetes_state check enabled by default for the Checks Agent.
   # Define the AWS check for the Checks Agent.
   - name: conf.yaml
     path: /etc/stackstate-agent/conf.d/aws_topology.d
     data: |
       cluster_check: true
       init_config:
         aws_access_key_id: ''
         aws_secret_access_key: ''
         external_id: uniquesecret!1
         # full_run_interval: 3600
       instances:
       - role_arn: arn:aws:iam::123456789012:role/StackStateAwsIntegrationRole
           regions:
           - global
           - eu-west-1
           collection_interval: 60

Configure

Advanced Agent configuration

StackState Agent V2 can be configured to reduce data production, tune the process blacklist, or turn off specific features when not needed. The required settings are described in detail on the page advanced Agent configuration.

External integration configuration

To integrate with other external services, a separate instance of the StackState Agent should be deployed on a standalone VM. Other than kubernetes_state check and AWS check, it is not currently possible to configure a StackState Agent deployed on a Kubernetes or OpenShift cluster with checks that integrate with other services.

Commands

Agent and Cluster Agent pod status

To check the status of the Kubernetes or OpenShift integration, check that the StackState Cluster Agent (cluster-agent) pod, StackState Checks Agent pod (checks-agent) and all of the StackState Agent (node-agent) pods have status READY.

❯ kubectl get deployment,daemonset --namespace stackstate

NAME                                                 READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/stackstate-agent-cluster-agent       1/1     1            1           5h14m
deployment.apps/stackstate-agent-checks-agent        1/1     1            1           5h14m
NAME                                                 DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/stackstate-agent-node-agent           10        10        10      10           10          <none>          5h14m

Agent check status

To find the status of an Agent check:

  1. Find the Agent pod that is running on the node where you would like to find a check status:

    kubectl get pod --output wide
  2. Run the command:

    kubectl exec <agent-pod-name> -n <agent-namespace> -- agent status
  3. Look for the check name under the Checks section.

Troubleshooting

Log files

Logs for the Agent can be found in the agent pod, where the StackState Agent is running.

Debug mode

By default, the log level of the Agent is set to INFO. To assist in troubleshooting, the Agent log level can be set to DEBUG. This will enable verbose logging and all errors encountered will be reported in the Agent log files.

  • To set the log level to DEBUG for an Agent running on Kubernetes or OpenShift, set 'agent.logLevel'='debug' in the helm command when deploying the Agent.

  • To also include the topology/telemetry payloads sent to StackState in the Agent log, set --set-string 'global.extraEnv.open.STS_LOG_PAYLOADS'='true'.

For example:

helm upgrade --install \
   --namespace stackstate \
   --create-namespace \
   --set-string 'stackstate.apiKey'='<STACKSTATE_RECEIVER_API_KEY>' \
   --set-string 'stackstate.cluster.name'='<KUBERNETES_CLUSTER_NAME>' \
   --set-string 'stackstate.url'='<STACKSTATE_RECEIVER_API_ADDRESS>' \
   --set-string 'global.extraEnv.open.STS_LOG_PAYLOADS'='true' \
   --set 'agent.logLevel'='debug' \
   stackstate-agent stackstate/stackstate-agent

Support knowledge base

Troubleshooting steps for any known issues can be found in the StackState support knowledge base.

Uninstall

To uninstall the StackState Cluster Agent and the StackState Agent from your Kubernetes or OpenShift cluster, run a Helm uninstall:

helm uninstall <release_name> --namespace <namespace>

# If you used the standard install command provided when you installed the StackPack
helm uninstall stackstate-agent --namespace stackstate

See also

Last updated