LogoLogo
StackState.comDownloadSupportExplore playground
StackState v5.1
StackState v5.1
  • Welcome to the StackState docs!
  • StackState self-hosted v5.1 docs
  • Getting Started
  • 🚀Setup
    • Install StackState
      • Requirements
      • Kubernetes / OpenShift
        • Kubernetes install
        • OpenShift install
        • Required Permissions
        • Non-high availability setup
        • Override default configuration
        • Configure storage
        • Configure Ingress
        • Install from custom image registry
        • Migrate from Linux install
      • Linux
        • Before you install
        • Download
        • Install StackState
        • Install with production configuration
        • Install with development configuration
        • Install with POC configuration
        • Set up a reverse proxy
        • Set up TLS without reverse proxy
      • Initial run guide
      • Troubleshooting
    • Upgrade StackState
      • Steps to upgrade
      • Version specific upgrade instructions
      • StackPack versions
      • StackState release notes
    • StackState Agent
      • About StackState Agent V3
      • Docker
      • Kubernetes / OpenShift
      • Linux
      • Windows
      • Advanced Agent configuration
      • Use an HTTP/HTTPS proxy
      • Agent V1 (legacy)
      • Migrate Agent V1 to Agent V2
        • Linux
        • Docker
    • StackState CLI
      • CLI: sts
      • CLI: stac (deprecated)
      • Comparison between CLIs
    • Data management
      • Backup and Restore
        • Kubernetes backup
        • Linux backup
        • Configuration backup
      • Data retention
      • Clear stored data
  • 👤Use
    • Concepts
      • The 4T data model
      • Components
      • Relations
      • Health state
      • Layers, Domains and Environments
      • Perspectives
      • Anomaly detection
      • StackState architecture
    • StackState UI
      • Explore mode
      • Filters
      • Views
        • About views
        • Configure the view health
        • Create and edit views
        • Visualization settings
      • Perspectives
        • Topology Perspective
        • Events Perspective
        • Traces Perspective
        • Metrics Perspective
      • Timeline and time travel
      • Analytics
      • Keyboard shortcuts
    • Checks and monitors
      • Checks
      • Add a health check
      • Anomaly health checks
      • Monitors
      • Manage monitors
    • Problem analysis
      • About problems
      • Problem lifecycle
      • Investigate a problem
      • Problem notifications
    • Metrics
      • Telemetry streams
      • Golden signals
      • Top metrics
      • Add a telemetry stream
      • Browse telemetry
      • Set telemetry stream priority
    • Events
      • About events
      • Event notifications
      • Manage event handlers
    • Glossary
  • 🧩StackPacks
    • About StackPacks
    • Add-ons
      • Autonomous Anomaly Detector
      • Health Forecast
    • Integrations
      • About integrations
      • 💠StackState Agent V2
      • 💠AWS
        • AWS
        • AWS ECS
        • AWS X-ray
        • StackState/Agent IAM role: EC2
        • StackState/Agent IAM role: EKS
        • Policies for AWS
        • AWS (legacy)
        • Migrate AWS (legacy) to AWS
      • 💠Dynatrace
      • 💠Kubernetes
      • 💠OpenShift
      • 💠OpenTelemetry
        • About instrumentations
        • AWS NodeJS Instrumentation
        • Manual Instrumentation
          • Prerequisites
          • Tracer and span mappings
          • Relations between components
          • Span health state
          • Merging components
          • Code examples
      • 💠ServiceNow
      • 💠Slack
      • 💠Splunk
        • Splunk
        • Splunk Events
        • Splunk Health
        • Splunk Metrics
        • Splunk Topology
      • 💠VMWare vSphere
      • Apache Tomcat
      • Azure
      • Cloudera
      • Custom Synchronization
      • DotNet APM
      • Elasticsearch
      • Humio
      • Java APM
      • JMX
      • Logz.io
      • MySQL
      • Nagios
      • OpenMetrics
      • PostgreSQL
      • Prometheus
      • SAP
      • SCOM
      • SolarWinds
      • Static Health
      • Static Topology
      • Traefik
      • WMI
      • Zabbix
    • Develop your own StackPacks
  • 🔧Configure
    • Topology
      • Component actions
      • Identifiers
      • Topology naming guide
      • Topology sources
      • Create a topology manually
      • Configure topology synchronizations
      • Enable email event notifications
      • Send topology data over HTTP
      • Set the topology filtering limit
      • Use a proxy for event handlers
      • Use tags
      • Tune topology synchronization
      • Debug topology synchronization
    • Telemetry
      • Add telemetry during topology synchronization
      • Data sources
        • Elasticsearch
        • Prometheus mirror
      • Send events over HTTP
      • Send metrics data over HTTP
      • Set the default telemetry interval
      • Debug telemetry synchronization
    • Traces
      • Set up traces
      • Advanced configuration for traces
    • Health
      • Health synchronization
      • Send health data over HTTP
        • Send health data
        • Repeat Snapshots JSON
        • Repeat States JSON
        • Transactional Increments JSON
      • Debug health synchronization
    • Anomaly Detection
      • Export anomaly feedback
      • Scale the AAD up and down
      • The AAD status UI
    • Security
      • Authentication
        • Authentication options
        • File based
        • LDAP
        • Open ID Connect (OIDC)
        • KeyCloak
        • Service tokens
      • RBAC
        • Role-based Access Control
        • Permissions
        • Roles
        • Scopes
        • Subjects
      • Secrets management
      • Self-signed certificates
      • Set up a security backend for Linux
      • Set up a security backend for Windows
    • Logging
      • Kubernetes logs
      • Linux logs
      • Enable logging for functions
  • 📖Develop
    • Developer guides
      • Agent checks
        • About Agent checks
        • Agent check API
        • Agent check state
        • How to develop Agent checks
        • Connect an Agent check to StackState
      • Custom functions and scripts
        • StackState functions
        • Check functions
        • Component actions
        • Event handler functions
        • ID extractor functions
        • Mapping functions
        • Monitor functions
        • Propagation functions
        • Template functions
        • View health state configuration functions
      • Custom Synchronization StackPack
        • About the Custom Synchronization StackPack
        • How to customize elements created by the Custom Synchronization StackPack
        • How to configure a custom synchronization
      • Integrate external services
      • Mirroring Telemetry
      • Monitors
        • Create monitors
        • Monitor STJ file format
      • StackPack development
        • How to create a StackPack
        • Packaging
        • How to get a template file
        • How to make a multi-instance StackPack
        • Prepare a multi-instance provisioning script
        • Upload a StackPack file
        • Prepare a shared template
        • Customize a StackPack
        • Prepare instance template files
        • Prepare a StackPack provisioning script
        • Resources in a StackPack
        • StackState Common Layer
      • Synchronizations and templated files
    • Reference
      • StackState OpenAPI docs
      • StackState Template JSON (STJ)
        • Using STJ
        • Template functions
      • StackState Markup Language (STML)
        • Using STML
        • STML Tags
      • StackState Query Language (STQL)
      • StackState Scripting Language (STSL)
        • Scripting in StackState
        • Script result: Async
        • Script result: Streaming
        • Time in scripts
        • Script APIs
          • Async - script API
          • Component - script API
          • HTTP - script API
          • Prediction - script API
          • StackPack - script API
          • Telemetry - script API
          • Time - script API
          • Topology - script API
          • UI - script API
          • View - script API
    • Tutorials
      • Create a simple StackPack
      • Push data to StackState from an external system
      • Send events to StackState from an external system
      • Set up a mirror to pull telemetry data from an external system
Powered by GitBook
LogoLogo

Legal notices

  • Privacy
  • Cookies
  • Responsible disclosure
  • SOC 2/SOC 3
On this page
  • Overview
  • Requirements
  • Four golden signals
  • Latency
  • Traffic
  • Errors
  • Saturation
  • Checks
  • Example: Error percentage
  • Example: Response time
  • Configuration and limitations
  1. Use
  2. Metrics

Golden signals

StackState Self-hosted v5.1.x

PreviousTelemetry streamsNextTop metrics

Last updated 2 years ago

Overview

To assist in monitoring distributed systems with a defined SLO (Service Level Objective), StackState can be configured to alert you if an SLI (Service Level Indicator) falls below a defined threshold. StackState Agent V2 deployed on a Linux host will retrieve telemetry that can be used to monitor the . These metrics can then be used to build a in StackState that responds to fluctuations in service level.

The checks described on this page don't ensure that you are meeting your SLO directly, but they can help prevent an SLO violation by catching and alerting on changes in your SLIs as soon as possible.

Requirements

To work with golden signals and the checks described on this page, you need to have:

  • The installed in StackState.

  • version 2.12 or higher running on a Linux host with a network tracer.

Four golden signals

Latency

To monitor the time it takes to service a request, StackState supports metric streams for HTTP response time for processes and services.

  • any

  • success (100-399)

  • 1xx

  • 2xx

  • 3xx

  • 4xx

  • 5xx

By default, processes and services that serve HTTP requests have the following response time streams:

  • HTTP total response time (s) (95th percentile)

  • HTTP 5xx error response time (s) (95th percentile)

  • HTTP 4xx error response time (s) (95th percentile)

  • HTTP Success response time (s) (95th percentile)

Traffic

Similar to measuring the latency, StackState Agent V2 supports the http_requests_per_second telemetry stream. The same response codes and predefined groups are also supported for the traffic stream.

By default, processes and services that serve HTTP requests have the following request rate streams:

  • HTTP total rate (req/s)

  • HTTP 5xx error rate (req/s)

  • HTTP 4xx error rate (req/s)

  • HTTP Success rate (req/s)

Errors

Saturation

There are many ways StackState can help monitor the saturation of a system, for example:

  • HTTP Requests per second

  • CPU usage

  • Memory usage

Checks

Example: Error percentage

The Error percentage check function can be used to monitor two streams - one reporting errors and one reporting a total. A DEVIATING or CRITICAL health state will be returned if the percentage of errors/total crosses the specified DeviatingThresholdPercentage or CriticalThresholdPercentage.

If your SLO defines that a service can have a maximum of 5% of requests failing, you can create a check using the Error percentage function and set the CriticalThresholdPercentage to 5.0:

Example: Response time

The Greater than or equal check function can alert you when one of your telemetry streams is above a certain threshold. A DEVIATING or CRITICAL health state will be returned if the specified DeviatingThreshold or CriticalThreshold is crossed.

Use this function to make sure you meet your SLO for maximum response time, for example:

Configuration and limitations

Note that StackState Agent can only report on request rate and response time of HTTP/1 protocol. HTTP/2, HTTP/3 and HTTPS protocols aren't currently supported.

To add a latency stream, and select the following metric: http_response_time_seconds. You can filter the stream on any HTTP response code or any of the predefined groups:

StackState allows you to monitor on any specific HTTP error code or one of the 4xx or 5xx error groups, as explained above. If your SLO specifies a limit for the rate of errors in your system, you can .

From StackState Agent V2.13, monitoring golden signals will automatically be added for streams with HTTP traffic.

To help meet a specific SLA (Service Level Agreement), you can in StackState. Examples of using a check function to monitor error percentage and response time are given below.

The metrics described in this page are gathered by the StackState Agent and can be disabled. Refer to the for more information.

👤
add a telemetry stream
Anomaly health checks
StackState Agent documentation
add a check
four golden signals (sre.google)
StackState Agent V2 StackPack
StackState Agent V2
check
add health checks
HTTP total response time (s)
HTTP response code
HTTP total requests per second
HTTP 5xx error rate
Error percentage check
Response time check