LogoLogo
StackState.comDownloadSupportExplore playground
SUSE Observability
SUSE Observability
  • SUSE Observability docs!
  • Docs for all SUSE Observability products
  • 🚀Get started
    • Quick start guide
    • SUSE Observability walk-through
    • SUSE Rancher Prime
      • Air-gapped
      • Agent Air-gapped
    • SUSE Cloud Observability
  • 🦮Guided troubleshooting
    • What is guided troubleshooting?
    • YAML Configuration
    • Changes
    • Logs
  • 🚨Monitors and alerts
    • Monitors
    • Out of the box monitors for Kubernetes
    • Notifications
      • Configure notifications
      • Notification channels
        • Slack
        • Teams
        • Webhook
        • Opsgenie
      • Troubleshooting
    • Customize
      • Add a monitor using the CLI
      • Derived State monitor
      • Dynamic Threshold monitor
      • Override monitor arguments
      • Write a remediation guide
  • 📈Metrics
    • Explore Metrics
    • Custom charts
      • Adding custom charts to components
      • Writing PromQL queries for representative charts
      • Troubleshooting custom charts
    • Advanced Metrics
      • Grafana Datasource
      • Prometheus remote_write
      • OpenMetrics
  • 📑Logs
    • Explore Logs
    • Log Shipping
  • 🔭Traces
    • Explore Traces
  • 📖Health
    • Health synchronization
    • Send health data over HTTP
      • Send health data
      • Repeat Snapshots JSON
      • Transactional Increments JSON
    • Debug health synchronization
  • 🔍Views
    • Kubernetes views
    • Custom views
    • Component views
    • Explore views
    • View structure
      • Overview perspective
      • Highlights perspective
      • Topology perspective
      • Events perspective
      • Metrics perspective
      • Traces perspective
      • Filters
      • Keyboard shortcuts
    • Timeline and time travel
  • 🕵️Agent
    • Network configuration
      • Proxy Configuration
    • Using a custom registry
    • Custom Secret Management
      • Custom Secret Management (Deprecated)
    • Request tracing
      • Certificates for sidecar injection
  • 🔭Open Telemetry
    • Overview
    • Getting started
      • Concepts
      • Kubernetes
      • Kubernetes Operator
      • Linux
      • AWS Lambda
    • Open telemetry collector
      • Sampling
      • SUSE Observability OTLP APIs
    • Instrumentation
      • Java
      • Node.js
        • Auto-instrumentation of Lambdas
      • .NET
      • SDK Exporter configuration
    • Troubleshooting
  • CLI
    • SUSE Observability CLI
  • 🚀Self-hosted setup
    • Install SUSE Observability
      • Requirements
      • Kubernetes / OpenShift
        • Kubernetes install
        • OpenShift install
        • Alibaba Cloud ACK install
        • Required Permissions
        • Override default configuration
        • Configure storage
        • Exposing SUSE Observability outside of the cluster
      • Initial run guide
      • Troubleshooting
        • Advanced Troubleshooting
        • Support Package (Logs)
    • Configure SUSE Observability
      • Slack notifications
      • E-mail notifications
      • Stackpacks
      • Advanced
        • Analytics
    • Release Notes
      • v2.0.0 - 11/Sep/2024
      • v2.0.1 - 18/Sep/2024
      • v2.0.2 - 01/Oct/2024
      • v2.1.0 - 29/Oct/2024
      • v2.2.0 - 09/Dec/2024
      • v2.2.1 - 10/Dec/2024
      • v2.3.0 - 30/Jan/2025
      • v2.3.1 - 17/Mar/2025
      • v2.3.2 - 22/Apr/2025
      • v2.3.3 - 07/May/2025
    • Upgrade SUSE Observability
      • Migration from StackState
      • Steps to upgrade
      • Version-specific upgrade instructions
    • Uninstall SUSE Observability
    • Air-gapped
      • SUSE Observability air-gapped
      • SUSE Observability Kubernetes Agent air-gapped
    • Data management
      • Backup and Restore
        • Kubernetes backup
        • Configuration backup
      • Data retention
      • Clear stored data
    • Security
      • Authentication
        • Authentication options
        • Single password
        • File-based
        • LDAP
        • Open ID Connect (OIDC)
          • Microsoft Entra ID
        • KeyCloak
        • Service tokens
        • Troubleshooting
      • RBAC
        • Role-based Access Control
        • Permissions
        • Roles
        • Scopes
      • Self-signed certificates
      • External secrets
  • 🔐Security
    • Service Tokens
    • API Keys
  • ☁️SaaS
    • User Management
  • Reference
    • SUSE Observability Query Language (STQL)
    • Chart units
    • Topology Identifiers
Powered by GitBook
LogoLogo

Legal notices

  • Privacy
  • Cookies
  • Responsible disclosure
  • SOC 2/SOC 3
On this page
  • Overview
  • Dynamic Threshold Monitor example
  1. Monitors and alerts
  2. Customize

Dynamic Threshold monitor

SUSE Observability

Overview

For metrics that vary significantly over time and differ from service to service, a Dynamic Threshold monitor provides simple and performant anomaly detection. It uses data from 1, 2 or 3 weeks ago in addition to the recent past as context to compare current data to.

Data from the "check window" is compared to that provided by the historic context using the Anderson-Darling test. This imposes very little assumptions on the data distribution. The test is particularly sensitive to outliers on the upper and lower ends of the distribution. The metric can be smooth, spiky or have a couple of "levels" - as data values are compared directly, without any model fitting, the Dynamic Threshod monitor is very robust.

For metrics that vary smoothly over time (e.g. on a timescale of 5 minutes), the effective number of data points is smaller than the raw number. The DT compensates for this so the same monitor can be used for a wide range of metrics without the need for adjusting its parameters.

There are a couple of parameters that can be set for the monitor function:

  • falsePositiveRate: say !!float 1e-8 - the sensitivity of the monitor to deviating behavior. A lower value suppresses more (false) positives but may also lead to false negatives (unnoticed anomalies).

  • checkWindowMinutes: say 10 minutes - the check window needs to be balanced between quick alerting (small values) and correctly identified anomalies (high values). A handful of data points works well in practice.

  • historicWindowMinutes: say 120 (2 hours) - bracketed around the current time, but then one or more weeks ago - so from 1 hour before the current time to 1 hour after. Also the 2 hours before the check window are used. The dynamic threshold monitor compares the distribution of this historic data with the data points in the check window.

  • historySizeWeeks: say 2 - the number of weeks that data is taken from for historic context. Can be 1, 2 or 3.

  • removeTrend: for metrics that have trend behavior (say, number of requests), such that the absolute value differs from week to week, this trend (the average value) can be accounted for.

  • includePreviousDay: typically false - for metrics that do not have a weekly but only a daily pattern, this allows the use of more recent data

Dynamic Threshold Monitor example

A Monitor implemented using the Dynamic Threshold function looks like:

  - _type: "Monitor"
    name: "<name of the monitor>"
    identifier: "urn:custom:monitor:<identifier for the monitor>"
    status: "DISABLED"
    description: "<description>"
    function: {{ get "urn:stackpack:aad-v2:shared:monitor-function:dt" }}
    arguments:
      telemetryQuery:
        query: "<metric to bind to component>"
        unit: s
        aliasTemplate: "<name for the metric>"
      topologyQuery: <topology query for the components to bind to>
      falsePositiveRate: <floating point number, e.g. !!float 1e-8>
      checkWindowMinutes: <integer, e.g. 10>
      historicWindowMinutes: <integer, e.g. 120>
      historySizeWeeks: <integer: 1, 2 or 3>
      includePreviousDay: <boolean>
      removeTrend: <boolean>
    intervalSeconds: 60
    remediationHint: "<how to remediate deviating states>"
PreviousDerived State monitorNextOverride monitor arguments

Last updated 20 days ago

The monitor can be implemented using the guide at

🚨
Add a threshold monitor to components using the CLI