Debug telemetry synchronization
StackState Self-hosted v4.6.x
If telemetry data is not available in StackState, follow the steps below to pinpoint the issue.
Identify the scale of impact
The first step in troubleshooting a telemetry issue is to identify if all metrics are missing or just specific metrics from a single integration. To do this:
- 1.Click through the topology in the StackState UI to check which components have telemetry available. If telemetry is missing for a single integration only, this will be clear in the elements and views associated with this integration.
- 2.Open the telemetry inspector and adjust the selected metric and filters to check if any telemetry data is available.
- Metrics from all integrations that run through StackState Agent (push-based) can be found in the data source StackState Metrics.
- Metrics from integrations that run through StackState plugins or the Prometheus mirror (pull-based) can be found in the associated data source that has been configured in the StackState Settings.
If the problem relates to a single integration:
If the problem affects all integrations:
Telemetry is either pushed to StackState by a StackState Agent, or pulled from an external data source by a StackState plugin or the Prometheus mirror.
Telemetry synchronization process
- 1.StackState Agent:
- 2.StackState receiver:
- 3.Elasticsearch in StackState:
- 4.StackState plugins:
- Pull data from AWS, Azure, external Elasticsearch, Prometheus or Splunk at the
Minimum live stream polling interval (seconds)configured for the data source.
- 5.Telemetry stream configuration:
- Specifies the telemetry data that should be included in the stream.
- For push-based synchronizations, Elasticsearch is queried to retrieve telemetry data.
- For pull-based integrations, telemetry data is requested from an external source system by a StackState plugin or the prometheus mirror.
- Attaches retrieved telemetry data to the element in StackState.
For integrations that run through StackState Agent, StackState Agent is a good place to start an investigation.
- The integration can be triggered manually using the
stackstate-agent check <check_name> -l debugcommand on your terminal. This command will not send any data to StackState. Instead, it will return the topology and telemetry collected to standard output along with any generated log messages.
Note that for the Kubernetes and OpenShift integrations, different Agent types supply different sets of metrics.
- StackState Agents (node Agents): Supply metrics from the node on which they are deployed only. If cluster checks are not enabled, this will include metrics from kube-state-metrics if it is deployed on the same node.
- ClusterCheck Agent: When cluster checks are enabled, supplies metrics from kube-state-metrics.
The StackState receiver receives JSON data from the StackState Agent.
Telemetry data from push-based integrations is stored in an Elasticsearch index. The naming of the fields within the index is entirely based on the data retrieved from the external source system.
- Use the telemetry inspector to check which data is available in Elasticsearch by selecting the data source
StackState Multi Metrics. All metrics available in the selected data source are listed under Select. Note that if no data is available for a telemetry stream, the telemetry inspector can still be opened by selecting inspect from the context menu (the triple dots menu in the top-right corner of the telemetry stream).
To add telemetry to an element, the filters specified for each telemetry stream attached to an element are used to build a query. For push-based synchronizations, Elasticsearch is queried to retrieve the associated telemetry data. For pull-based synchronizations, the associated StackState plugin queries the external data source directly.
- Check that data is available for the selected filters. An update to an external system may result in a change to the name applied to metrics in Elasticsearch or no results being returned when the external data source is queried.
- Use auto-complete to select the filters. This ensures that the correct names are entered.
When StackState is deployed on Kubernetes, there are pods with descriptive names and logging is output to standard out.
The following logs may be useful when debugging telemetry synchronization:
- There is a pod for the StackState Receiver.
- There is a pod for each Kafka-to-Elasticsearch process. These processes are responsible for getting telemetry data to Elasticsearch. Note that there are processes for metrics, events, and traces. For example, the pod
stackstate-mm2esis responsible for metrics.
When deployed on Linux, StackState log files are located in the directory:
The following log files may be useful when debugging telemetry synchronization:
- StackState Receiver:
/opt/stackstate/var/log/kafka-to-es- contains logs for the processes that are responsible for getting telemetry data to Elasticsearch. Note that there are separate processes for metrics, events, and traces.
StackState Agent log files are located in the directory: