LogoLogo
StackState.comDownloadSupportExplore playground
StackState v5.1
StackState v5.1
  • Welcome to the StackState docs!
  • StackState self-hosted v5.1 docs
  • Getting Started
  • 🚀Setup
    • Install StackState
      • Requirements
      • Kubernetes / OpenShift
        • Kubernetes install
        • OpenShift install
        • Required Permissions
        • Non-high availability setup
        • Override default configuration
        • Configure storage
        • Configure Ingress
        • Install from custom image registry
        • Migrate from Linux install
      • Linux
        • Before you install
        • Download
        • Install StackState
        • Install with production configuration
        • Install with development configuration
        • Install with POC configuration
        • Set up a reverse proxy
        • Set up TLS without reverse proxy
      • Initial run guide
      • Troubleshooting
    • Upgrade StackState
      • Steps to upgrade
      • Version specific upgrade instructions
      • StackPack versions
      • StackState release notes
    • StackState Agent
      • About StackState Agent V3
      • Docker
      • Kubernetes / OpenShift
      • Linux
      • Windows
      • Advanced Agent configuration
      • Use an HTTP/HTTPS proxy
      • Agent V1 (legacy)
      • Migrate Agent V1 to Agent V2
        • Linux
        • Docker
    • StackState CLI
      • CLI: sts
      • CLI: stac (deprecated)
      • Comparison between CLIs
    • Data management
      • Backup and Restore
        • Kubernetes backup
        • Linux backup
        • Configuration backup
      • Data retention
      • Clear stored data
  • 👤Use
    • Concepts
      • The 4T data model
      • Components
      • Relations
      • Health state
      • Layers, Domains and Environments
      • Perspectives
      • Anomaly detection
      • StackState architecture
    • StackState UI
      • Explore mode
      • Filters
      • Views
        • About views
        • Configure the view health
        • Create and edit views
        • Visualization settings
      • Perspectives
        • Topology Perspective
        • Events Perspective
        • Traces Perspective
        • Metrics Perspective
      • Timeline and time travel
      • Analytics
      • Keyboard shortcuts
    • Checks and monitors
      • Checks
      • Add a health check
      • Anomaly health checks
      • Monitors
      • Manage monitors
    • Problem analysis
      • About problems
      • Problem lifecycle
      • Investigate a problem
      • Problem notifications
    • Metrics
      • Telemetry streams
      • Golden signals
      • Top metrics
      • Add a telemetry stream
      • Browse telemetry
      • Set telemetry stream priority
    • Events
      • About events
      • Event notifications
      • Manage event handlers
    • Glossary
  • 🧩StackPacks
    • About StackPacks
    • Add-ons
      • Autonomous Anomaly Detector
      • Health Forecast
    • Integrations
      • About integrations
      • 💠StackState Agent V2
      • 💠AWS
        • AWS
        • AWS ECS
        • AWS X-ray
        • StackState/Agent IAM role: EC2
        • StackState/Agent IAM role: EKS
        • Policies for AWS
        • AWS (legacy)
        • Migrate AWS (legacy) to AWS
      • 💠Dynatrace
      • 💠Kubernetes
      • 💠OpenShift
      • 💠OpenTelemetry
        • About instrumentations
        • AWS NodeJS Instrumentation
        • Manual Instrumentation
          • Prerequisites
          • Tracer and span mappings
          • Relations between components
          • Span health state
          • Merging components
          • Code examples
      • 💠ServiceNow
      • 💠Slack
      • 💠Splunk
        • Splunk
        • Splunk Events
        • Splunk Health
        • Splunk Metrics
        • Splunk Topology
      • 💠VMWare vSphere
      • Apache Tomcat
      • Azure
      • Cloudera
      • Custom Synchronization
      • DotNet APM
      • Elasticsearch
      • Humio
      • Java APM
      • JMX
      • Logz.io
      • MySQL
      • Nagios
      • OpenMetrics
      • PostgreSQL
      • Prometheus
      • SAP
      • SCOM
      • SolarWinds
      • Static Health
      • Static Topology
      • Traefik
      • WMI
      • Zabbix
    • Develop your own StackPacks
  • 🔧Configure
    • Topology
      • Component actions
      • Identifiers
      • Topology naming guide
      • Topology sources
      • Create a topology manually
      • Configure topology synchronizations
      • Enable email event notifications
      • Send topology data over HTTP
      • Set the topology filtering limit
      • Use a proxy for event handlers
      • Use tags
      • Tune topology synchronization
      • Debug topology synchronization
    • Telemetry
      • Add telemetry during topology synchronization
      • Data sources
        • Elasticsearch
        • Prometheus mirror
      • Send events over HTTP
      • Send metrics data over HTTP
      • Set the default telemetry interval
      • Debug telemetry synchronization
    • Traces
      • Set up traces
      • Advanced configuration for traces
    • Health
      • Health synchronization
      • Send health data over HTTP
        • Send health data
        • Repeat Snapshots JSON
        • Repeat States JSON
        • Transactional Increments JSON
      • Debug health synchronization
    • Anomaly Detection
      • Export anomaly feedback
      • Scale the AAD up and down
      • The AAD status UI
    • Security
      • Authentication
        • Authentication options
        • File based
        • LDAP
        • Open ID Connect (OIDC)
        • KeyCloak
        • Service tokens
      • RBAC
        • Role-based Access Control
        • Permissions
        • Roles
        • Scopes
        • Subjects
      • Secrets management
      • Self-signed certificates
      • Set up a security backend for Linux
      • Set up a security backend for Windows
    • Logging
      • Kubernetes logs
      • Linux logs
      • Enable logging for functions
  • 📖Develop
    • Developer guides
      • Agent checks
        • About Agent checks
        • Agent check API
        • Agent check state
        • How to develop Agent checks
        • Connect an Agent check to StackState
      • Custom functions and scripts
        • StackState functions
        • Check functions
        • Component actions
        • Event handler functions
        • ID extractor functions
        • Mapping functions
        • Monitor functions
        • Propagation functions
        • Template functions
        • View health state configuration functions
      • Custom Synchronization StackPack
        • About the Custom Synchronization StackPack
        • How to customize elements created by the Custom Synchronization StackPack
        • How to configure a custom synchronization
      • Integrate external services
      • Mirroring Telemetry
      • Monitors
        • Create monitors
        • Monitor STJ file format
      • StackPack development
        • How to create a StackPack
        • Packaging
        • How to get a template file
        • How to make a multi-instance StackPack
        • Prepare a multi-instance provisioning script
        • Upload a StackPack file
        • Prepare a shared template
        • Customize a StackPack
        • Prepare instance template files
        • Prepare a StackPack provisioning script
        • Resources in a StackPack
        • StackState Common Layer
      • Synchronizations and templated files
    • Reference
      • StackState OpenAPI docs
      • StackState Template JSON (STJ)
        • Using STJ
        • Template functions
      • StackState Markup Language (STML)
        • Using STML
        • STML Tags
      • StackState Query Language (STQL)
      • StackState Scripting Language (STSL)
        • Scripting in StackState
        • Script result: Async
        • Script result: Streaming
        • Time in scripts
        • Script APIs
          • Async - script API
          • Component - script API
          • HTTP - script API
          • Prediction - script API
          • StackPack - script API
          • Telemetry - script API
          • Time - script API
          • Topology - script API
          • UI - script API
          • View - script API
    • Tutorials
      • Create a simple StackPack
      • Push data to StackState from an external system
      • Send events to StackState from an external system
      • Set up a mirror to pull telemetry data from an external system
Powered by GitBook
LogoLogo

Legal notices

  • Privacy
  • Cookies
  • Responsible disclosure
  • SOC 2/SOC 3
On this page
  • Overview
  • General troubleshooting steps
  • Common issues
  • Check state not visible on the component
  • Check state slow to update in StackState
  • Useful CLI commands
  • List streams
  • List sub streams
  • Show stream status
  • Show substream status
  • Delete a health stream
  • Clear health stream errors
  • Error messages
  • See also
  1. Configure
  2. Health

Debug health synchronization

StackState Self-hosted v5.1.x

PreviousTransactional Increments JSONNextAnomaly Detection

Last updated 2 years ago

Overview

The can be used to troubleshoot a health synchronization and fix issues that might prevent health data from being correctly ingested and displayed in StackState. This page describes the general troubleshooting steps to take when debugging a health synchronization, as well as the CLI commands used, and a description of the error messages returned.

General troubleshooting steps

When debugging the health synchronization there are some common verification steps that can be made no matter what the specific issue is:

  1. .

  2. If you are using sub streams, . The response will also show the number of check states on the substream. This lets you know if the data is being ingested and processed.

  3. Investigate further:

    • Stream present - , this will show the metrics latency of the stream and any .

    • Streams / sub streams present, but there are no check states - Confirm that the payload sent to the Receiver API adheres to the .

    • No streams / sub streams are present - Use the CLI command below to verify that health data sent to the Receiver API is arriving in StackState:

From StackState v5.0, the old sts CLI has been renamed to stac and there is a new sts CLI. The command(s) provided here are for use with the new sts CLI.

➡️

$ sts topic describe --name sts_health_sync

From StackState v5.0, the old sts CLI is called stac. The old CLI is now deprecated.

The new sts CLI replaces the stac CLI. It's advised to install the new sts CLI and upgrade any installed instance of the old sts CLI to stac. For details see:

$ stac topic show sts_health_sync

Common issues

Check state not visible on the component

There can be two reasons for a check state not to show on a component in StackState:

Check state slow to update in StackState

Useful CLI commands

List streams

Returns a list of all current synchronized health streams and the number of sub streams included in each.

From StackState v5.0, the old sts CLI has been renamed to stac and there is a new sts CLI. The command(s) provided here are for use with the new sts CLI.

$ sts health list
STREAM URN                                              | STREAM CONSISTENCY MODEL | SUB STREAM COUNT
urn:health:sourceId:streamId                            | REPEAT_SNAPSHOTS         | 1

From StackState v5.0, the old sts CLI is called stac. The old CLI is now deprecated.

The new sts CLI replaces the stac CLI. It's advised to install the new sts CLI and upgrade any installed instance of the old sts CLI to stac. For details see:

# List streams
$ stac health list-streams

stream urn                                            substream count
--------------------------------------------------  ------------------
urn:health:sourceId:streamId                                         1

List sub streams

Returns a list of all sub streams for a given stream URN, together with the number of check states in each.

From StackState v5.0, the old sts CLI has been renamed to stac and there is a new sts CLI. The command(s) provided here are for use with the new sts CLI.

$ sts health list -u urn:health:sourceId:streamId
SUB STREAM ID  | CHECK STATE COUNT
subStreamId1   | 1
subStreamId2   | 1

From StackState v5.0, the old sts CLI is called stac. The old CLI is now deprecated.

The new sts CLI replaces the stac CLI. It's advised to install the new sts CLI and upgrade any installed instance of the old sts CLI to stac. For details see:

# List sub streams
$ stac health list-sub-streams urn:health:sourceId:streamId

substream id                     check state count
------------------------------  -------------------
subStreamId1                                     20
subStreamId2                                     17

Show stream status

From StackState v5.0, the old sts CLI has been renamed to stac and there is a new sts CLI. The command(s) provided here are for use with the new sts CLI.

$ sts health status -u urn:health:sourceId:streamId

From StackState v5.0, the old sts CLI is called stac. The old CLI is now deprecated.

The new sts CLI replaces the stac CLI. It's advised to install the new sts CLI and upgrade any installed instance of the old sts CLI to stac. For details see:

# Show a stream status
$ stac health show urn:health:sourceId:streamId

Aggregate metrics for the stream and all substreams:

metric                             value between now and 300 seconds ago    value between 300 and 600 seconds ago    value between 600 and 900 seconds ago
---------------------------------  ---------------------------------------  ---------------------------------------  ---------------------------------------
latency (Seconds)                  1.102                                    1.102                                    -
messages processed (per second)    0.256                                    0.16                                     -
check states created (per second)  0.10555555555555556                      0.10666666666666667                      -
check states updated (per second)  -                                        -                                        -
check states deleted (per second)  -                                        -                                        -

Errors for non-existing sub streams:

error message                                                                                   error occurrence count
----------------------------------------------------------------------------------------------  ------------------------
Substream `substream with ID `subStreamId2`` not started when receiving snapshot stop                          6

Show substream status

The substream status provides useful information to verify that StackState could bind check states sent from an external system to existing topology elements. This information is helpful to debug why a specific check isn't visible on the expected topology element.

From StackState v5.0, the old sts CLI has been renamed to stac and there is a new sts CLI. The command(s) provided here are for use with the new sts CLI.

$ sts health status -u urn:health:sourceId:streamId -sub-stream-urn subStreamId3

From StackState v5.0, the old sts CLI is called stac. The old CLI is now deprecated.

The new sts CLI replaces the stac CLI. It's advised to install the new sts CLI and upgrade any installed instance of the old sts CLI to stac. For details see:

# Show a substream status.
$ stac health show urn:health:sourceId:streamId -s "subStreamId3"

Synchronized check state count: 32
Repeat interval (Seconds): 120
Expiry (Seconds): 240

Synchronization errors:

code    level    message    occurrence count
------  -------  ---------  ------------------

Synchronization metrics:

metric                             value between now and 300 seconds ago    value between 300 and 600 seconds ago    value between 600 and 900 seconds ago
---------------------------------  ---------------------------------------  ---------------------------------------  ---------------------------------------
latency (Seconds)                  0.23                                     0.125                                    0.265
messages processed (per second)    0.256                                    0.2773333333333333                       0.256
check states created (per second)  -                                        -                                        -
check states updated (per second)  -                                        -                                        -
check states deleted (per second)  -

A substream status will show the metadata related to the consistency model:

  • Repeat Snapshots - Show repeat interval and expiry

  • Repeat States - Show repeat interval and expiry

  • Transactional Increments - Show checkpoint offset and checkpoint batch index

The substream status can be expanded to include details of matched and unmatched check states using the -t command line argument. This is helpful to identify any health states that aren't attached to a topology element. In the example below, checkStateId2 is listed under Check states with identifier which has no matching topology element. This means that it was not possible to match the check state to a topology element with the identifier server-2.

From StackState v5.0, the old sts CLI has been renamed to stac and there is a new sts CLI. The command(s) provided here are for use with the new sts CLI.

$ sts health status -u urn:health:sourceId:streamId -sub-stream-urn subStreamId3 -t

From StackState v5.0, the old sts CLI is called stac. The old CLI is now deprecated.

The new sts CLI replaces the stac CLI. It's advised to install the new sts CLI and upgrade any installed instance of the old sts CLI to stac. For details see:

# Show a substream status matched/unmatched check states.
$ stac health show urn:health:sourceId:streamId -s "subStreamId3" -t
# If we configured our stream to not use explicit substreams then a default
# substream can be reached by omitting the optional substreamId parameter as in:
$ stac health show urn:health:sourceId:streamId -t

Check states with identifier matching exactly 1 topology element: 32

Check states with identifier which has no matching topology element:

check state id    topology element identifier
----------------  -----------------------------
checkStateId2     server-2

Check states with identifier which has multiple matching topology elements:

check state id    topology element identifier    number of matched topology elements
----------------  -----------------------------  -------------------------------------

Delete a health stream

The delete stream functionality is helpful while setting up a health synchronization in StackState. You can use it to experiment, delete the data and start over again clean. You can also delete a stream and drop its data when you are sure that you don't want to keep using it.

From StackState v5.0, the old sts CLI has been renamed to stac and there is a new sts CLI. The command(s) provided here are for use with the new sts CLI.

$ sts health delete -u urn:health:sourceId:streamId

From StackState v5.0, the old sts CLI is called stac. The old CLI is now deprecated.

The new sts CLI replaces the stac CLI. It's advised to install the new sts CLI and upgrade any installed instance of the old sts CLI to stac. For details see:

# Delete a health synchronization stream
$ stac health delete urn:health:sourceId:streamId

Clear health stream errors

The clear-errors option removes all errors from a health stream. This is helpful while setting up a health synchronization in StackState, or, for the case of the TRANSACTIONAL_INCREMENTS consistency model, when some errors can't be removed organically. For example, a request to delete a check state might raise an error if the check state isn't known to StackState. The only way to suppress such an error would be to use the clear-errors command.

From StackState v5.0, the old sts CLI has been renamed to stac and there is a new sts CLI. The command(s) provided here are for use with the new sts CLI.

$ sts health clear-error -u urn:health:sourceId:streamId

From StackState v5.0, the old sts CLI is called stac. The old CLI is now deprecated.

The new sts CLI replaces the stac CLI. It's advised to install the new sts CLI and upgrade any installed instance of the old sts CLI to stac. For details see:

# Clear health stream errors
$ stac health clear-errors urn:health:sourceId:streamId

Error messages

Errors will be closed once the described issue has been remediated.

For example a SubStreamStopWithoutStart will be closed once the health synchronization observes a start snapshot message followed by a stop snapshot message.

Error
Description

StreamMissingSubStream

Raised when the health synchronization receives messages without a previous stream setup message as start_snapshot or expiry.

StreamConsistencyModelMismatch

Raised when a message is received that belongs to a different consistency model than that specified when the stream was created.

StreamMissingSubStream

Raised when the health synchronization receives messages with a previous start snapshot in place.

SubStreamRepeatIntervalTooHigh

Raised when the health synchronization receives a repeat_interval_s greater than the configured max of 30 minutes.

SubStreamStartWithoutStop

Raised when the health synchronization receives a second message to open a snapshot when a previous snapshot was still open.

SubStreamCheckStateOutsideSnapshot

Raised when the health synchronization receives external check states without previously opening a snapshot.

SubStreamStopWithoutStart

Raised when the health synchronization receives a stop snapshot message without having started a snapshot at all.

SubStreamMissingStop

Raised when the health synchronization doesn't receive a stop snapshot after time out period of two times the repeat_interval_s established in the start snapshot message. In this case an automatic stop snapshot will be applied.

SubStreamExpired

Raised when the health synchronization stops receiving data on a particular substream for longer than the configured expiry_interval_s. In this case, the substream will be deleted.

SubStreamLateData

Raised when the health synchronization doesn't receive a complete snapshot timely based on the established repeat_interval_s.

SubStreamTransformerError

Raised when the health synchronization is unable to interpret the payload sent to the receiver. For example, "Missing required field 'name'" with payload {"checkStateId":"checkStateId3","health":"deviating","message":"Unable to provision the device. ","topologyElementIdentifier":"server-3"} and transformation Default Transformation.

SubStreamMissingCheckpoint

Raised when a Transactional increments substream previously observed a checkpoint, but the received message is missing the previous_checkpoint

SubStreamInvalidCheckpoint

Raised when a Transactional increments substream previously observed a checkpoint, but the received message has a previous_checkpoint that isn't equivalent to the last observed one.

SubStreamOutdatedCheckpoint

Raised when a Transactional increments substream previously observed a checkpoint, but the received message has a checkpoint that precedes the last observed one, meaning that its data that StackState already received.

SubStreamUnknownCheckState

Raised when deleting a Transactional increments check_state and the check_state_id isn't present on the substream.

See also

The health check state hasn't been created. Follow the to confirm that the stream / substream has been created and that data is arriving in StackState.

The health check state was created, but its topologyElementIdentifier doesn't match any identifiers from the StackState topology. Use the CLI command to verify if there are any Check states with identifier which has no matching topology element.

The main reason for this is that the latency of the health synchronization is higher than expected. Use the CLI command to confirm the latency of the stream as well as the throughput of messages and specific check operations. It may be necessary to tweak the data sent to the health synchronization, or the frequency with which data is sent.

➡️

➡️

The stream status command returns the aggregated stream latency and throughput metrics. This is helpful when debugging why a health check takes a long time to land on the expected topology elements. It will help diagnose if the frequency of data sent to StackState should be adjusted. The output includes a section Errors for non-existing sub streams: as some errors are only relevant when a substream couldn't be created, for example StreamMissingSubStream. Substream errors can be any of the documented .

➡️

➡️

➡️

➡️

➡️

🔧
Comparison between the CLIs
Comparison between the CLIs
Comparison between the CLIs
Comparison between the CLIs
Comparison between the CLIs
Comparison between the CLIs
Comparison between the CLIs
Install the StackState CLI
general troubleshooting steps
show substream status
show stream status
error messages
StackState CLI
health payload specification
Comparison between the CLIs
Verify that the stream exists
verify that the substream exists
Check the stream status
errors
Check which version of the sts CLI you are running
Which version of the sts CLI am I running?
Check which version of the sts CLI you are running
Which version of the sts CLI am I running?
Check which version of the sts CLI you are running
Which version of the sts CLI am I running?
Check which version of the sts CLI you are running
Which version of the sts CLI am I running?
Check which version of the sts CLI you are running
Which version of the sts CLI am I running?
Check which version of the sts CLI you are running
Which version of the sts CLI am I running?
Check which version of the sts CLI you are running
Which version of the sts CLI am I running?
Check which version of the sts CLI you are running
Which version of the sts CLI am I running?
Install the new sts CLI and upgrade the old sts CLI to stac
Install the new sts CLI and upgrade the old sts CLI to stac
Install the new sts CLI and upgrade the old sts CLI to stac
Install the new sts CLI and upgrade the old sts CLI to stac
Install the new sts CLI and upgrade the old sts CLI to stac
Install the new sts CLI and upgrade the old sts CLI to stac
Install the new sts CLI and upgrade the old sts CLI to stac
Install the new sts CLI and upgrade the old sts CLI to stac