Debug telemetry synchronization

StackState Self-hosted v5.0.x

This page describes StackState version 5.0.

Go to the .

Overview

This page explains the and how to go about troubleshooting issues with telemetry synchronization.

Troubleshooting steps

If telemetry data is not available in StackState, follow the steps below to pinpoint the issue.

Identify the scale of impact

The first step in troubleshooting a telemetry issue is to identify if all metrics are missing or just specific metrics from a single integration. To do this:

Click through the topology in the StackState UI to check which components have telemetry available. If telemetry is missing for a single integration only, this will be clear in the elements and views associated with this integration.
Open the and adjust the selected metric and filters to check if any telemetry data is available.
- Metrics from all integrations that run through StackState Agent (push-based) can be found in the data source StackState Metrics.
- Metrics from integrations that run through StackState plugins or the Prometheus mirror (pull-based) can be found in the associated data source that has been configured in the StackState Settings.

If the problem relates to a single integration:

If the affected integration runs through StackState Agent (push-based):
1. Start by checking .
2. Confirm that telemetry data has arrived in .
Check the filters in . These should match the data received from the external source.

If the problem affects all integrations:

How telemetry is synchronized

Synchronization process

Telemetry is either pushed to StackState by a StackState Agent, or pulled from an external data source by a StackState plugin or the Prometheus mirror.

StackState Agent:
- Connects to a data source to collect data.
- Connects to the StackState Receiver to push collected data to StackState (in JSON format).
StackState Receiver:
- Extracts topology and telemetry payloads from the received JSON.
Elasticsearch in StackState:
- Stores telemetry data received via the StackState Receiver.
StackState plugins:
- Pull data from AWS, Azure, external Elasticsearch, Prometheus or Splunk at the Minimum live stream polling interval (seconds) configured for the data source.
Telemetry stream configuration:
- Specifies the telemetry data that should be included in the stream.
- For push-based synchronizations, Elasticsearch is queried to retrieve telemetry data.
- For pull-based integrations, telemetry data is requested from an external source system by a StackState plugin or the prometheus mirror.
- Attaches retrieved telemetry data to the element in StackState.

StackState Agent

For integrations that run through StackState Agent, StackState Agent is a good place to start an investigation.

The integration can be triggered manually using the stackstate-agent check <check_name> -l debug command on your terminal. This command will not send any data to StackState. Instead, it will return the topology and telemetry collected to standard output along with any generated log messages.

Note that for the Kubernetes and OpenShift integrations, different Agent types supply different sets of metrics.

StackState Agents (node Agents): Supply metrics from the node on which they are deployed only. If cluster checks are not enabled, this will include metrics from kube-state-metrics if it is deployed on the same node.
ClusterCheck Agent: When cluster checks are enabled, supplies metrics from kube-state-metrics.

StackState Receiver

The StackState Receiver receives JSON data from the StackState Agent.

Elasticsearch

Telemetry data from push-based integrations is stored in an Elasticsearch index. The naming of the fields within the index is entirely based on the data retrieved from the external source system.

Telemetry stream configuration

To add telemetry to an element, the filters specified for each telemetry stream attached to an element are used to build a query. For push-based synchronizations, Elasticsearch is queried to retrieve the associated telemetry data. For pull-based synchronizations, the associated StackState plugin queries the external data source directly.

Check that data is available for the selected filters. An update to an external system may result in a change to the name applied to metrics in Elasticsearch or no results being returned when the external data source is queried.
Use auto-complete to select the filters. This ensures that the correct names are entered.

Log files

StackState

When StackState is deployed on Kubernetes, there are pods with descriptive names and logging is output to standard out.

The following logs may be useful when debugging telemetry synchronization:

There is a pod for the StackState Receiver.
There is a pod for each Kafka-to-Elasticsearch process. These processes are responsible for getting telemetry data to Elasticsearch. Note that there are processes for metrics, events, and traces. For example, the pod stackstate-mm2es is responsible for metrics.

When deployed on Linux, StackState log files are located in the directory:

The following log files may be useful when debugging telemetry synchronization:

StackState Receiver: /opt/stackstate/var/log/stackstate-receiver
kafkaToEs: /opt/stackstate/var/log/kafka-to-es - contains logs for the processes that are responsible for getting telemetry data to Elasticsearch. Note that there are separate processes for metrics, events, and traces.
ElasticSearch: /opt/stackstate/var/log/elasticsearch7

StackState Agent

For details of the StackState Agent log files, see the platform-specific Agent pages:

Debug telemetry synchronization

Overview

Troubleshooting steps

How telemetry is synchronized

Synchronization process

StackState Agent

StackState Receiver

Elasticsearch

Telemetry stream configuration

Log files

StackState

StackState Agent

See also

Overview

Troubleshooting steps

How telemetry is synchronized

Synchronization process

StackState Agent

StackState Receiver

Elasticsearch

Telemetry stream configuration

Log files

StackState

StackState Agent

See also