LogoLogo
StackState.comDownloadSupportExplore playground
StackState v6.0
StackState v6.0
  • StackState docs!
  • Docs for all StackState products
  • 🚀Get started
    • Quick start guide
    • StackState walk-through
    • SUSE Rancher Prime
      • Air-gapped
      • Agent Air-gapped
  • 🦮Guided troubleshooting
    • What is guided troubleshooting?
    • YAML Configuration
    • Changes
    • Logs
  • 🚨Monitors and alerts
    • Monitors
    • Out of the box monitors for Kubernetes
    • Notifications
      • Configure notifications
      • Notification channels
        • Slack
        • Teams
        • Webhook
        • Opsgenie
      • Troubleshooting
    • Customize
      • Add a monitor using the CLI
      • Override monitor arguments
      • Write a remediation guide
  • 📈Metrics
    • Explore Metrics
    • Custom charts
      • Adding custom charts to components
      • Writing PromQL queries for representative charts
      • Troubleshooting custom charts
    • Advanced Metrics
      • Grafana Datasource
      • Prometheus remote_write
      • OpenMetrics
  • 📑Logs
    • Explore Logs
    • Log Shipping
  • 🔭Traces
    • Explore Traces
  • 📖Health
    • Health synchronization
    • Send health data over HTTP
      • Send health data
      • Repeat Snapshots JSON
      • Repeat States JSON
      • Transactional Increments JSON
    • Debug health synchronization
  • 🔍Views
    • Kubernetes views
    • Custom views
    • Component views
    • Explore views
    • View structure
      • Filters
      • Overview perspective
      • Highlights perspective
      • Topology perspective
      • Events perspective
      • Metrics perspective
      • Traces perspective
    • Timeline and time travel
  • 🕵️Agent
    • Network configuration
      • Proxy Configuration
    • Using a custom registry
    • Custom Secret Management
    • Request tracing
      • Certificates for sidecar injection
  • 🔭Open Telemetry
    • Getting started
    • Open telemetry collector
    • Languages
      • Generic Exporter configuration
      • Java
      • Node.js
      • .NET
      • Verify the results
    • Troubleshooting
  • CLI
    • StackState CLI
  • 🚀Self-hosted setup
    • Install StackState
      • Requirements
      • Kubernetes / OpenShift
        • Kubernetes install
        • OpenShift install
        • Required Permissions
        • Non-high availability setup
        • Small profile setup
        • Override default configuration
        • Configure storage
        • Exposing StackState outside of the cluster
      • Initial run guide
      • Troubleshooting
        • Logs
    • Configure StackState
      • Slack notifications
      • Stackpacks
    • Release Notes
      • v1.11.0 - 18/07/2024
      • v1.11.3 - 15/08/2024
      • v1.11.4 - 29/08/2024
      • v1.12.0 - 24/10/2024
      • v1.12.1 - 08/11/2024
    • Upgrade StackState
      • Steps to upgrade
      • Version-specific upgrade instructions
    • Uninstall StackState
    • Air-gapped
      • StackState air-gapped
      • StackState Kubernetes Agent air-gapped
    • Data management
      • Backup and Restore
        • Kubernetes backup
        • Configuration backup
      • Data retention
      • Clear stored data
    • Security
      • Authentication
        • Authentication options
        • File-based
        • LDAP
        • Open ID Connect (OIDC)
        • KeyCloak
        • Service tokens
      • RBAC
        • Role-based Access Control
        • Permissions
        • Roles
        • Scopes
      • Self-signed certificates
  • 🔐Security
    • Service Tokens
    • Ingestion API Keys
  • ☁️SaaS
    • User Management
  • Reference
    • StackState Query Language (STQL)
    • Chart units
Powered by GitBook
LogoLogo

Legal notices

  • Privacy
  • Cookies
  • Responsible disclosure
  • SOC 2/SOC 3
On this page
  • Overview
  • Set up health synchronization
  • Health synchronization pipeline
  • Consistency models
  • Health stream and substream
  • Repeat Interval
  • Expire Interval
  • Check State
  • External Monitor
  • See also
  1. Health

Health synchronization

StackState v6.0

PreviousExplore TracesNextSend health data over HTTP

Last updated 10 months ago

This section describes the advanced topic of synchronizing custom health data from different monitoring systems to StackState. This topic is mostly interesting for engineers who want to make a custom integration with an existing monitoring system. For out of the box monitors you can look .

Overview

Health synchronization adds existing health checks from external monitoring systems to StackState topology elements. Health data is calculated in the external monitoring system using its own data and rules, then automatically synchronized and attached to the associated topology elements in StackState.

Set up health synchronization

The StackState Receiver API will automatically receive and process all incoming health data. StackState doesn't require additional configuration to enable health synchronization, however, the health data received should match the expected JSON format.

Details on how to ingest health data can be found on the following pages:

Health synchronization pipeline

The health synchronization framework works as follows:

  • Health data is sent to StackState and ingested via the Receiver API.

  • StackState topology elements related to the ingested health checks are identified and bound based on:

    • the topology identifiers obtained during topology synchronization.

    • the topologyElementIdentifier from the ingested .

  • StackState keeps track of changes to both topology elements and health checks to maintain up-to-date information.

Consistency models

The REPEAT_SNAPSHOTS consistency model works with periodic, full snapshots of all checks in an external monitoring system. StackState keeps track of the checks in each received snapshot and decides if associated external check states need to be created, updated or deleted in StackState. For example, if a check state is no longer present in a snapshot. This model offers full control over which external checks will be deleted as all decisions are inferred from the received snapshots. There is no ambiguity over the external checks that will be present in StackState.

Use this model when: The external monitoring system is capable of keeping the state of which elements are present in a determined time window and therefore can communicate how the full snapshot looks like.

The REPEAT_STATES consistency model works with periodic checks received from an external monitoring system. StackState keeps track of the checks and decides if associated external checks need to be created or updated in StackState. A configurable expiry mechanism is used to delete external checks that aren't observed anymore. This model offers less control over data than the REPEAT_SNAPSHOTS model. As an expiry configuration is used to delete external checks, it might happen that elements are deleted due to barely missing the expiry timeout. This would reflect as external checks disappearing and reappearing in StackState.

Use this model when: The external monitoring system isn't capable of collecting all checks in a determined time window. The best effort is just to send the external checks as they're obtained.

The TRANSACTIONAL_INCREMENTS consistency model is designed to be used on streaming systems where only incremental changes are communicated to StackState. As there is no repetition of data, data consistency is upheld by ensuring that at-least-once delivery is guaranteed across the entire pipeline. To detect whether any data is missing, StackState requires that both a checkpoint and the previous checkpoint are communicated together with the check_states. This model requires strict control across the whole pipeline to guarantee no data loss.

Use this model when: The external monitoring system doesn't have access to the total external checks state, but only works on an event based approach.

Health stream and substream

External monitoring systems send health data to the StackState Receiver in a health stream. Each health stream has at least one substream with health checks.

Health stream

The Health stream uniquely identifies the health synchronization and defines the boundaries within which the health check states should be processed together.

Substream

Repeat Interval

Expire Interval

The expire interval can be used to configure sub streams in the health synchronization to delete data that isn't sent by the external system anymore. This is helpful in case the source for a substream could be decommissioned and StackState would not hear from it again. Without an expire interval, the previously synchronized data would be left permanently hanging.

Check State

Once attached to a topology element, the health check state contributes to the element's own health state.

External Monitor

    {
      "_type": "ExternalMonitor",
      "healthStreamUrn": "urn:health:kubernetes:external-health",
      "description": "Monitored by external tool.",
      "identifier": "urn:custom:external-monitor:heartbeat",
      "name": "External Monitor Heartbeat",
      "remediationHint": "",
      "tags": [
        "heartbeat"
      ]
    }

Every ExternalMonitor payload has the following details:

  • _type: StackState needs to know this is a monitor so, value always needs to be ExternalMonitor

  • description: A description of the external monitor.

  • identifier: An identifier of the form urn:custom:external-monitor:.... which uniquely identifies the external monitor when updating its configuration.

  • name: The name of the external monitor

  • remediationHint: A description of what the user can do when the monitor fails. The format is markdown.

  • tags: Add tags to the monitor to help organize them in the monitors overview of your StackState instance, http://your-StackState-instance/#/monitors

  • Create a new YAML file called externalMonitor.yaml and add this YAML template to it to create your own external monitor.

nodes:
- _type: ExternalMonitor
  healthStreamUrn: urn:health:sourceId:streamId
  description: Monitored by external tool.
  identifier: urn:custom:external-monitor:heartbeat
  name: External Monitor Heartbeat
  remediationHint: |-
    To remedy this issue with the deployment {{ labels.deployment }}, consider taking the following steps:
    
    1. Look at the logs of the pods created by the deployment
  tags:
    - heartbeat
  • Use the cli to create the external monitor

sts settings apply -f externalMonitor.yaml 
✅ Applied 1 setting node(s).                                                                                                                                                                                                               

TYPE            | ID              | IDENTIFIER                            | NAME                      
ExternalMonitor | 150031117290020 | urn:custom:external-monitor:heartbeat | External Monitor Heartbeat

See also

StackState health synchronization relies on different consistency models to guarantee that the data sent from an external monitoring system matches with what StackState ingests and shows. The consistency model is specified in the "health" property of the or as an argument in the StackState CLI when health data is sent to StackState. The supported models are: REPEAT_SNAPSHOTS, REPEAT_STATES and TRANSACTIONAL_INCREMENTS.

JSON payload: The accepts specific properties to specify when a snapshot starts or stops.

JSON payload: The accepts specific properties to specify the expiry configuration.

JSON payload: The metadata repeat_interval and expire_interval aren't relevant for the as there is no predefined periodicity on the data.

Sub streams contain the health check data that are processed by StackState. When working with health data from a distributed external monitoring system, multiple sub streams can be configured, each containing health snapshots from a single location. The data in each substream is semi-independent, but contributes to the health check states of the complete health stream. If a single location is responsible for reporting the health check states of the health stream, you can omit the sub_stream_id from the . StackState will assume that all the external health checks belong to a single, default substream.

Health synchronization processes the ingested health data per substream. The repeat interval specified in the is the commitment from the external monitoring system to send complete snapshots over and over to keep the data up to date on StackState. This is helpful for StackState to be able to inform the user how up to date the health synchronization is running.

The health check state is calculated by an external monitoring system and includes all information required to attach it to a topology element. In order to be able to materialize and attach it to a component it requires to attribute the health state to a particular monitor in this case an .

An external monitor allows to attach the health states to components and to show a remediationHint on the StackState highlight pages. This resource needs to be created via the or as part of a stackpack. Here is an example of an externa monitor:

healthStreamUrn: This field needs to match the urn that is sent as part of the .

Here is an example of how to create an External Monitor using the

📖
Repeat Snapshots health payload
Repeat States health payload
Transactional Increments health payload
StackState CLI
StackState CLI
ExternalMonitor
here
Ingest health data through the StackState Receiver API
Health synchronization pipeline
Health Payload
health payload
common JSON object
health payload
health payload
JSON health payload