LogoLogo
StackState.comDownloadSupportExplore playground
StackState v5.1
StackState v5.1
  • Welcome to the StackState docs!
  • StackState self-hosted v5.1 docs
  • Getting Started
  • 🚀Setup
    • Install StackState
      • Requirements
      • Kubernetes / OpenShift
        • Kubernetes install
        • OpenShift install
        • Required Permissions
        • Non-high availability setup
        • Override default configuration
        • Configure storage
        • Configure Ingress
        • Install from custom image registry
        • Migrate from Linux install
      • Linux
        • Before you install
        • Download
        • Install StackState
        • Install with production configuration
        • Install with development configuration
        • Install with POC configuration
        • Set up a reverse proxy
        • Set up TLS without reverse proxy
      • Initial run guide
      • Troubleshooting
    • Upgrade StackState
      • Steps to upgrade
      • Version specific upgrade instructions
      • StackPack versions
      • StackState release notes
    • StackState Agent
      • About StackState Agent V3
      • Docker
      • Kubernetes / OpenShift
      • Linux
      • Windows
      • Advanced Agent configuration
      • Use an HTTP/HTTPS proxy
      • Agent V1 (legacy)
      • Migrate Agent V1 to Agent V2
        • Linux
        • Docker
    • StackState CLI
      • CLI: sts
      • CLI: stac (deprecated)
      • Comparison between CLIs
    • Data management
      • Backup and Restore
        • Kubernetes backup
        • Linux backup
        • Configuration backup
      • Data retention
      • Clear stored data
  • 👤Use
    • Concepts
      • The 4T data model
      • Components
      • Relations
      • Health state
      • Layers, Domains and Environments
      • Perspectives
      • Anomaly detection
      • StackState architecture
    • StackState UI
      • Explore mode
      • Filters
      • Views
        • About views
        • Configure the view health
        • Create and edit views
        • Visualization settings
      • Perspectives
        • Topology Perspective
        • Events Perspective
        • Traces Perspective
        • Metrics Perspective
      • Timeline and time travel
      • Analytics
      • Keyboard shortcuts
    • Checks and monitors
      • Checks
      • Add a health check
      • Anomaly health checks
      • Monitors
      • Manage monitors
    • Problem analysis
      • About problems
      • Problem lifecycle
      • Investigate a problem
      • Problem notifications
    • Metrics
      • Telemetry streams
      • Golden signals
      • Top metrics
      • Add a telemetry stream
      • Browse telemetry
      • Set telemetry stream priority
    • Events
      • About events
      • Event notifications
      • Manage event handlers
    • Glossary
  • 🧩StackPacks
    • About StackPacks
    • Add-ons
      • Autonomous Anomaly Detector
      • Health Forecast
    • Integrations
      • About integrations
      • 💠StackState Agent V2
      • 💠AWS
        • AWS
        • AWS ECS
        • AWS X-ray
        • StackState/Agent IAM role: EC2
        • StackState/Agent IAM role: EKS
        • Policies for AWS
        • AWS (legacy)
        • Migrate AWS (legacy) to AWS
      • 💠Dynatrace
      • 💠Kubernetes
      • 💠OpenShift
      • 💠OpenTelemetry
        • About instrumentations
        • AWS NodeJS Instrumentation
        • Manual Instrumentation
          • Prerequisites
          • Tracer and span mappings
          • Relations between components
          • Span health state
          • Merging components
          • Code examples
      • 💠ServiceNow
      • 💠Slack
      • 💠Splunk
        • Splunk
        • Splunk Events
        • Splunk Health
        • Splunk Metrics
        • Splunk Topology
      • 💠VMWare vSphere
      • Apache Tomcat
      • Azure
      • Cloudera
      • Custom Synchronization
      • DotNet APM
      • Elasticsearch
      • Humio
      • Java APM
      • JMX
      • Logz.io
      • MySQL
      • Nagios
      • OpenMetrics
      • PostgreSQL
      • Prometheus
      • SAP
      • SCOM
      • SolarWinds
      • Static Health
      • Static Topology
      • Traefik
      • WMI
      • Zabbix
    • Develop your own StackPacks
  • 🔧Configure
    • Topology
      • Component actions
      • Identifiers
      • Topology naming guide
      • Topology sources
      • Create a topology manually
      • Configure topology synchronizations
      • Enable email event notifications
      • Send topology data over HTTP
      • Set the topology filtering limit
      • Use a proxy for event handlers
      • Use tags
      • Tune topology synchronization
      • Debug topology synchronization
    • Telemetry
      • Add telemetry during topology synchronization
      • Data sources
        • Elasticsearch
        • Prometheus mirror
      • Send events over HTTP
      • Send metrics data over HTTP
      • Set the default telemetry interval
      • Debug telemetry synchronization
    • Traces
      • Set up traces
      • Advanced configuration for traces
    • Health
      • Health synchronization
      • Send health data over HTTP
        • Send health data
        • Repeat Snapshots JSON
        • Repeat States JSON
        • Transactional Increments JSON
      • Debug health synchronization
    • Anomaly Detection
      • Export anomaly feedback
      • Scale the AAD up and down
      • The AAD status UI
    • Security
      • Authentication
        • Authentication options
        • File based
        • LDAP
        • Open ID Connect (OIDC)
        • KeyCloak
        • Service tokens
      • RBAC
        • Role-based Access Control
        • Permissions
        • Roles
        • Scopes
        • Subjects
      • Secrets management
      • Self-signed certificates
      • Set up a security backend for Linux
      • Set up a security backend for Windows
    • Logging
      • Kubernetes logs
      • Linux logs
      • Enable logging for functions
  • 📖Develop
    • Developer guides
      • Agent checks
        • About Agent checks
        • Agent check API
        • Agent check state
        • How to develop Agent checks
        • Connect an Agent check to StackState
      • Custom functions and scripts
        • StackState functions
        • Check functions
        • Component actions
        • Event handler functions
        • ID extractor functions
        • Mapping functions
        • Monitor functions
        • Propagation functions
        • Template functions
        • View health state configuration functions
      • Custom Synchronization StackPack
        • About the Custom Synchronization StackPack
        • How to customize elements created by the Custom Synchronization StackPack
        • How to configure a custom synchronization
      • Integrate external services
      • Mirroring Telemetry
      • Monitors
        • Create monitors
        • Monitor STJ file format
      • StackPack development
        • How to create a StackPack
        • Packaging
        • How to get a template file
        • How to make a multi-instance StackPack
        • Prepare a multi-instance provisioning script
        • Upload a StackPack file
        • Prepare a shared template
        • Customize a StackPack
        • Prepare instance template files
        • Prepare a StackPack provisioning script
        • Resources in a StackPack
        • StackState Common Layer
      • Synchronizations and templated files
    • Reference
      • StackState OpenAPI docs
      • StackState Template JSON (STJ)
        • Using STJ
        • Template functions
      • StackState Markup Language (STML)
        • Using STML
        • STML Tags
      • StackState Query Language (STQL)
      • StackState Scripting Language (STSL)
        • Scripting in StackState
        • Script result: Async
        • Script result: Streaming
        • Time in scripts
        • Script APIs
          • Async - script API
          • Component - script API
          • HTTP - script API
          • Prediction - script API
          • StackPack - script API
          • Telemetry - script API
          • Time - script API
          • Topology - script API
          • UI - script API
          • View - script API
    • Tutorials
      • Create a simple StackPack
      • Push data to StackState from an external system
      • Send events to StackState from an external system
      • Set up a mirror to pull telemetry data from an external system
Powered by GitBook
LogoLogo

Legal notices

  • Privacy
  • Cookies
  • Responsible disclosure
  • SOC 2/SOC 3
On this page
  • Overview
  • Add a health check
  • Metric stream configuration
  • Windowing method
  • Time window (or window size)
  • Check functions
  • Check function: Autonomous metric stream anomaly detection
  • Synchronize external health data
  • See also
  1. Use
  2. Checks and monitors

Add a health check

StackState Self-hosted v5.1.x

PreviousChecksNextAnomaly health checks

Last updated 2 years ago

Overview

Health checks report a health state for elements (components and relations). The health state can either be calculated internally by StackState based on data from telemetry streams or synchronized with an external monitoring system.

The combined check states attached to an element are used to calculate its overall health status. When the status of an element changes, a state change event is generated. These events can be used to .

Add a health check

Most elements in the StackState topology will have a relevant health check added when they're created. If required, you can also add custom health checks that calculate a health state in StackState based on available telemetry streams or .

To add a health check calculated in StackState:

  1. Select the element that you want to assign a health check to.

    • Detailed information about the element will be displayed in the right panel details tab - Component details or Direct relation details depending on the element type that you selected.

    • If no telemetry stream is available on the selected element, you will need to first before you can add a health check.

  2. Click ADD NEW HEALTH CHECK under Health in the right panel details tab.

  3. In the Add check dialog box, enter the following details:

    • Name - The health check name. Will be displayed in the StackState UI right panel details tab Health section.

    • Description - Optional, can be used to explain the check in greater detail.

    • Remediation hint - Optional, will be automatically displayed on the element when this check goes to a non-clear state, for example critical or deviating.

    • Check function - The check function to use to monitor the element's telemetry stream(s). See below.

  4. Provide the required check function arguments, these will vary according to the check function selected, but will include:

    • At least one telemetry stream. Some checks will require multiple streams.

    • For metrics check functions, a .

  5. Click CREATE to create the health check.

    • The check is now active and visible under the Health section on the right.

    • The check will remain gray until enough telemetry data has been received to determine a health state.

Metric stream configuration

Windowing method

For metrics check functions, a windowing method and window size must be provided. This determines how often the check function will run based on the incoming metrics. There are two possible windowing methods, batching and sliding.

Method
Description

Batching

The batching windowing method groups metric data into strictly separate windows of the configured window time, with consistent start and end times. For example, with window size set to 60 seconds, a batching check will run every minute with metrics from the previous minute.

Sliding

The sliding windowing method groups metric data into overlapping windows. For example, with window size set to 60 seconds, a sliding check will run whenever the data flows in after 60 seconds of metrics have been collected. Note that runs of the check will adhere to the Minimum live stream polling interval configured for the data source.

Time window (or window size)

By default, the time window is 300000 milliseconds (or 5 minutes). The time window will directly influence the number of positive or false negative alerts. The longer you configure the time window, the less sensitive it will be. However, if it's too short this may lead to a sudden spike in unwanted alerts, which might not help you meet your SLO. You should balance the time window based on the metric and how early you want to be alerted on spikes.

Check functions

Each health check calculated in StackState uses a check function to monitor the telemetry stream attached to the element.

Check functions are scripts that take streaming telemetry as an input, check the data based on its logic and on the supplied arguments and output a health state. The telemetry changes a check function responds to determine the way in which the health check reports element health state, for example by monitoring a metric stream for thresholds and spikes, or checking the generated events. A number of check functions are included out of the box with StackState.

  • Details of the available check functions can be found in the StackState UI, go to Settings > Check functions.

Check function: Autonomous metric stream anomaly detection

The Autonomous metric stream anomaly detection health check reacts to anomaly events and sets the component health state to the DEVIATING (orange).

Synchronize external health data

See also

You can to customize how StackState assigns a health state to a metric stream.

➡️

You can from an external monitoring system and add them to StackState topology elements.

👤
create a custom check function
Learn more about anomaly health checks
synchronize existing health checks
Anomaly health checks
Add a telemetry stream to an element
Custom check functions
Custom monitor functions
Synchronize external health data
trigger event notifications and actions
add a telemetry stream
synchronize health data from an external monitoring system
Check functions
windowing method and window size
Add a health check to an element
Add an event notification