Debug telemetry synchronization

StackState Self-hosted v4.6.x

This page describes StackState version 4.6.

Go to the documentation for the latest StackState release.

Overview

This page explains the telemetry synchronization process and how to go about troubleshooting issues with telemetry synchronization.

Troubleshooting steps

If telemetry data is not available in StackState, follow the steps below to pinpoint the issue.

Identify the scale of impact

The first step in troubleshooting a telemetry issue is to identify if all metrics are missing or just specific metrics from a single integration. To do this:

  1. Click through the topology in the StackState UI to check which components have telemetry available. If telemetry is missing for a single integration only, this will be clear in the elements and views associated with this integration.

  2. Open the telemetry inspector and adjust the selected metric and filters to check if any telemetry data is available.

    • Metrics from all integrations that run through StackState Agent (push-based) can be found in the data source StackState Metrics.

    • Metrics from integrations that run through StackState plugins or the Prometheus mirror (pull-based) can be found in the associated data source that has been configured in the StackState Settings.

If the problem relates to a single integration:

If the problem affects all integrations:

How telemetry is synchronized

Synchronization process

Telemetry is either pushed to StackState by a StackState Agent, or pulled from an external data source by a StackState plugin or the Prometheus mirror.

  1. StackState Agent:

  2. StackState receiver:

  3. Elasticsearch in StackState:

  4. StackState plugins:

    • Pull data from AWS, Azure, external Elasticsearch, Prometheus or Splunk at the Minimum live stream polling interval (seconds) configured for the data source.

  5. Telemetry stream configuration:

    • Specifies the telemetry data that should be included in the stream.

    • For push-based synchronizations, Elasticsearch is queried to retrieve telemetry data.

    • For pull-based integrations, telemetry data is requested from an external source system by a StackState plugin or the prometheus mirror.

    • Attaches retrieved telemetry data to the element in StackState.

StackState Agent

For integrations that run through StackState Agent, StackState Agent is a good place to start an investigation.

  • Check the StackState Agent log for hints that it has problems connecting to StackState.

  • The integration can be triggered manually using the stackstate-agent check <check_name> -l debug command on your terminal. This command will not send any data to StackState. Instead, it will return the topology and telemetry collected to standard output along with any generated log messages.

Note that for the Kubernetes and OpenShift integrations, different Agent types supply different sets of metrics.

  • StackState Agents (node Agents): Supply metrics from the node on which they are deployed only. If cluster checks are not enabled, this will include metrics from kube-state-metrics if it is deployed on the same node.

  • ClusterCheck Agent: When cluster checks are enabled, supplies metrics from kube-state-metrics.

StackState receiver

The StackState receiver receives JSON data from the StackState Agent.

Elasticsearch

Telemetry data from push-based integrations is stored in an Elasticsearch index. The naming of the fields within the index is entirely based on the data retrieved from the external source system.

  • Use the telemetry inspector to check which data is available in Elasticsearch by selecting the data source StackState Multi Metrics. All metrics available in the selected data source are listed under Select. Note that if no data is available for a telemetry stream, the telemetry inspector can still be opened by selecting inspect from the context menu (the triple dots menu in the top-right corner of the telemetry stream).

  • If the expected data is not in Elasticsearch, check the KafkaToES log for errors.

Telemetry stream configuration

To add telemetry to an element, the filters specified for each telemetry stream attached to an element are used to build a query. For push-based synchronizations, Elasticsearch is queried to retrieve the associated telemetry data. For pull-based synchronizations, the associated StackState plugin queries the external data source directly.

In the StackState UI, open the telemetry inspector to see details of the applied filters:

  • Check that data is available for the selected filters. An update to an external system may result in a change to the name applied to metrics in Elasticsearch or no results being returned when the external data source is queried.

  • Use auto-complete to select the filters. This ensures that the correct names are entered.

Log files

StackState

When StackState is deployed on Kubernetes, there are pods with descriptive names and logging is output to standard out.

The following logs may be useful when debugging telemetry synchronization:

  • There is a pod for the StackState Receiver.

  • There is a pod for each Kafka-to-Elasticsearch process. These processes are responsible for getting telemetry data to Elasticsearch. Note that there are processes for metrics, events, and traces. For example, the pod stackstate-mm2es is responsible for metrics.

➡️ Learn more about StackState logs on Kubernetes

StackState Agent

StackState Agent log files are located in the directory:

/var/log/stackstate-agent/

See also

Last updated