LogoLogo
StackState.comDownloadSupportExplore playground
StackState v5.1
StackState v5.1
  • Welcome to the StackState docs!
  • StackState self-hosted v5.1 docs
  • Getting Started
  • 🚀Setup
    • Install StackState
      • Requirements
      • Kubernetes / OpenShift
        • Kubernetes install
        • OpenShift install
        • Required Permissions
        • Non-high availability setup
        • Override default configuration
        • Configure storage
        • Configure Ingress
        • Install from custom image registry
        • Migrate from Linux install
      • Linux
        • Before you install
        • Download
        • Install StackState
        • Install with production configuration
        • Install with development configuration
        • Install with POC configuration
        • Set up a reverse proxy
        • Set up TLS without reverse proxy
      • Initial run guide
      • Troubleshooting
    • Upgrade StackState
      • Steps to upgrade
      • Version specific upgrade instructions
      • StackPack versions
      • StackState release notes
    • StackState Agent
      • About StackState Agent V3
      • Docker
      • Kubernetes / OpenShift
      • Linux
      • Windows
      • Advanced Agent configuration
      • Use an HTTP/HTTPS proxy
      • Agent V1 (legacy)
      • Migrate Agent V1 to Agent V2
        • Linux
        • Docker
    • StackState CLI
      • CLI: sts
      • CLI: stac (deprecated)
      • Comparison between CLIs
    • Data management
      • Backup and Restore
        • Kubernetes backup
        • Linux backup
        • Configuration backup
      • Data retention
      • Clear stored data
  • 👤Use
    • Concepts
      • The 4T data model
      • Components
      • Relations
      • Health state
      • Layers, Domains and Environments
      • Perspectives
      • Anomaly detection
      • StackState architecture
    • StackState UI
      • Explore mode
      • Filters
      • Views
        • About views
        • Configure the view health
        • Create and edit views
        • Visualization settings
      • Perspectives
        • Topology Perspective
        • Events Perspective
        • Traces Perspective
        • Metrics Perspective
      • Timeline and time travel
      • Analytics
      • Keyboard shortcuts
    • Checks and monitors
      • Checks
      • Add a health check
      • Anomaly health checks
      • Monitors
      • Manage monitors
    • Problem analysis
      • About problems
      • Problem lifecycle
      • Investigate a problem
      • Problem notifications
    • Metrics
      • Telemetry streams
      • Golden signals
      • Top metrics
      • Add a telemetry stream
      • Browse telemetry
      • Set telemetry stream priority
    • Events
      • About events
      • Event notifications
      • Manage event handlers
    • Glossary
  • 🧩StackPacks
    • About StackPacks
    • Add-ons
      • Autonomous Anomaly Detector
      • Health Forecast
    • Integrations
      • About integrations
      • 💠StackState Agent V2
      • 💠AWS
        • AWS
        • AWS ECS
        • AWS X-ray
        • StackState/Agent IAM role: EC2
        • StackState/Agent IAM role: EKS
        • Policies for AWS
        • AWS (legacy)
        • Migrate AWS (legacy) to AWS
      • 💠Dynatrace
      • 💠Kubernetes
      • 💠OpenShift
      • 💠OpenTelemetry
        • About instrumentations
        • AWS NodeJS Instrumentation
        • Manual Instrumentation
          • Prerequisites
          • Tracer and span mappings
          • Relations between components
          • Span health state
          • Merging components
          • Code examples
      • 💠ServiceNow
      • 💠Slack
      • 💠Splunk
        • Splunk
        • Splunk Events
        • Splunk Health
        • Splunk Metrics
        • Splunk Topology
      • 💠VMWare vSphere
      • Apache Tomcat
      • Azure
      • Cloudera
      • Custom Synchronization
      • DotNet APM
      • Elasticsearch
      • Humio
      • Java APM
      • JMX
      • Logz.io
      • MySQL
      • Nagios
      • OpenMetrics
      • PostgreSQL
      • Prometheus
      • SAP
      • SCOM
      • SolarWinds
      • Static Health
      • Static Topology
      • Traefik
      • WMI
      • Zabbix
    • Develop your own StackPacks
  • 🔧Configure
    • Topology
      • Component actions
      • Identifiers
      • Topology naming guide
      • Topology sources
      • Create a topology manually
      • Configure topology synchronizations
      • Enable email event notifications
      • Send topology data over HTTP
      • Set the topology filtering limit
      • Use a proxy for event handlers
      • Use tags
      • Tune topology synchronization
      • Debug topology synchronization
    • Telemetry
      • Add telemetry during topology synchronization
      • Data sources
        • Elasticsearch
        • Prometheus mirror
      • Send events over HTTP
      • Send metrics data over HTTP
      • Set the default telemetry interval
      • Debug telemetry synchronization
    • Traces
      • Set up traces
      • Advanced configuration for traces
    • Health
      • Health synchronization
      • Send health data over HTTP
        • Send health data
        • Repeat Snapshots JSON
        • Repeat States JSON
        • Transactional Increments JSON
      • Debug health synchronization
    • Anomaly Detection
      • Export anomaly feedback
      • Scale the AAD up and down
      • The AAD status UI
    • Security
      • Authentication
        • Authentication options
        • File based
        • LDAP
        • Open ID Connect (OIDC)
        • KeyCloak
        • Service tokens
      • RBAC
        • Role-based Access Control
        • Permissions
        • Roles
        • Scopes
        • Subjects
      • Secrets management
      • Self-signed certificates
      • Set up a security backend for Linux
      • Set up a security backend for Windows
    • Logging
      • Kubernetes logs
      • Linux logs
      • Enable logging for functions
  • 📖Develop
    • Developer guides
      • Agent checks
        • About Agent checks
        • Agent check API
        • Agent check state
        • How to develop Agent checks
        • Connect an Agent check to StackState
      • Custom functions and scripts
        • StackState functions
        • Check functions
        • Component actions
        • Event handler functions
        • ID extractor functions
        • Mapping functions
        • Monitor functions
        • Propagation functions
        • Template functions
        • View health state configuration functions
      • Custom Synchronization StackPack
        • About the Custom Synchronization StackPack
        • How to customize elements created by the Custom Synchronization StackPack
        • How to configure a custom synchronization
      • Integrate external services
      • Mirroring Telemetry
      • Monitors
        • Create monitors
        • Monitor STJ file format
      • StackPack development
        • How to create a StackPack
        • Packaging
        • How to get a template file
        • How to make a multi-instance StackPack
        • Prepare a multi-instance provisioning script
        • Upload a StackPack file
        • Prepare a shared template
        • Customize a StackPack
        • Prepare instance template files
        • Prepare a StackPack provisioning script
        • Resources in a StackPack
        • StackState Common Layer
      • Synchronizations and templated files
    • Reference
      • StackState OpenAPI docs
      • StackState Template JSON (STJ)
        • Using STJ
        • Template functions
      • StackState Markup Language (STML)
        • Using STML
        • STML Tags
      • StackState Query Language (STQL)
      • StackState Scripting Language (STSL)
        • Scripting in StackState
        • Script result: Async
        • Script result: Streaming
        • Time in scripts
        • Script APIs
          • Async - script API
          • Component - script API
          • HTTP - script API
          • Prediction - script API
          • StackPack - script API
          • Telemetry - script API
          • Time - script API
          • Topology - script API
          • UI - script API
          • View - script API
    • Tutorials
      • Create a simple StackPack
      • Push data to StackState from an external system
      • Send events to StackState from an external system
      • Set up a mirror to pull telemetry data from an external system
Powered by GitBook
LogoLogo

Legal notices

  • Privacy
  • Cookies
  • Responsible disclosure
  • SOC 2/SOC 3
On this page
  • Overview
  • The anomaly detection process
  • Anomaly severity
  • Anomaly events
  • Anomaly feedback
  • Installation
  • Prerequisites
  • Install the AAD StackPack
  • Training period
  • Frequently Asked Questions
  • How are metric streams selected?
  • How fast are anomalies detected?
  • Can anomalies trigger alerts?
  • Uninstall
  • Release Notes
  • See also
  1. StackPacks
  2. Add-ons

Autonomous Anomaly Detector

StackState Self-hosted v5.1.x

PreviousAdd-onsNextHealth Forecast

Last updated 2 years ago

Overview

Anomaly detection identifies abnormal behavior in your fast-changing IT environment. This helps direct the attention of IT operators to the root cause of problems or can give an early warning. The Autonomous Anomaly Detector (AAD) requires zero configuration. It's fully autonomous in selecting both the metric streams it will apply anomaly detection to, and the appropriate machine learning algorithms to use for each metric stream.

The AAD supports daily and weekly seasonality, creating an anomaly when the observed values differ a lot from the expected values. Daily seasonality is enabled by default.

Note that the AAD requires a before it can begin to report anomalies.

The anomaly detection process

The Autonomous Anomaly Detector (AAD) is enabled as soon as the in StackState. When the AAD has been enabled, metric streams are identified and analyzed in search of any anomalous behavior based on their past. After the initial training period, detected anomalies will be reported in the following way:

  • The identified anomaly is given a (HIGH, MEDIUM or LOW).

  • The anomaly and time period during which anomalous behaviour was detected are shown on the associated metric stream chart. The color indicates the anomaly severity.

  • If the anomaly is considered to have a severity level of HIGH, an is generated.

Anomaly severity

Each identified anomaly is given a severity. This can be HIGH, MEDIUM or LOW. The severity shows how far a metric point has deviated from the expected model and the length of time for which anomalous data has been observed.

Severity
Description

🟥 HIGH (red)

🟧 MEDIUM (orange)

Reported for anomalous data observed for a short period of time or slightly anomalous data observed for a longer period of time. Reported less often than LOW severity and more often than HIGH severity anomalies. Useful for root cause analysis and can offer extra insight into HIGH severity anomalies reported on the stream.

🟨 LOW (yellow)

Reported when slightly anomalous data is observed. The most often reported anomaly severity. Less frequent occurrences of LOW severity anomalies indicates a higher reliability of anomaly reports from the AAD.

Anomaly events

  • Metric Stream - The name of the metric stream on which the anomaly was detected.

  • Metric chart - A chart with an extract from the metric stream centered around the detected anomaly.

  • Anomaly interval - The time period during which anomalous behaviour was detected. This is also shaded on the metric chart.

  • Description - A description of the observed anomaly.

  • Elements - The name of the element (or elements) on which the metric stream is attached

Anomaly feedback

Note that feedback isn't used to train the running instance of the AAD.

Models are selected by the AAD and optimized for each metric stream. The quality of the anomalies reported is determined to a large extent by how well the selected model describes the stream that it runs on. The StackState team works with representative datasets to develop new models and optimize the hyperparameters used for model selection and training the AAD.

To enable improvement of the AAD, users can add feedback to reported anomalies. This feedback can then be used by StackState to assist in the ongoing development of the AAD.

The feedback sent to StackState consists of:

  • Thumbs-up, Thumbs-down votes - Each user can cast one vote per reported anomaly.

  • Comments - Free-form text entered by users. Note that any comments added to an anomaly will be included in the feedback sent to StackState. Take care not to include sensitive data in comments.

  • Anomaly details - The description, interval, severity (score), model information, metric query and element, stream names.

  • Metric data - Data from the metric stream leading up to the anomaly.

Installation

Prerequisites

Install the AAD StackPack

Training period

The AAD will need to train on your data before it can begin reporting anomalies. With data collected in 1 minute buckets, the AAD requires a 2-hour training period. If historic data exists for relevant metric streams, this will also be used for training the AAD. In this case, the first results can be expected within an hour. Up to three days of data are used for training. After the initial training, the AAD will continuously refine its model and adapt to any changes in the data.

For weekly seasonality, the training period needs to be extended to at least three weeks of data. With fine-grained metrics data, this puts a considerable load on the metrics store. The access pattern is quite different from other typical metric data uses. Enabling weekly seasonality therefore must be validated to prevent degraded performance.

Frequently Asked Questions

How are metric streams selected?

The AAD scales to large environments by autonomously prioritizing metric streams based on its knowledge of the 4T data model and the stream priority defined by users. The metric stream selection algorithm ranks metric streams based on the criteria below:

  • Components in views that have the most stars by the most users are ranked highest.

  • Anomaly detection will be disabled on streams if more than 20% of their time is flagged as anomalous.

You can't directly control the stream selected, but you can steer the metric stream selection of the AAD by manipulating the above-mentioned factors.

How fast are anomalies detected?

Can anomalies trigger alerts?

Uninstall

To uninstall the AAD StackPack, simply press the UNINSTALL button. No other actions need to be taken.

Release Notes

Autonomous Anomaly Detector StackPack v0.9.2 (02-04-2021)

  • Common version bumped from 2.4.3 to 3.0.0

  • StackState min version bumped to 4.3.0

See also

Reported only when data points with a low probability of occurrence are observed for at least 3 minutes. The least often reported severity. .

When a HIGH severity anomaly is detected on a metric stream, a Metric Stream Anomaly event is generated. Anomaly events are listed on the Events Perspective and will also be reported as one of the . Select an event to display detailed information about it in the right panel details tab - Event details.

Severity - The . Anomaly events are only generated for HIGH severity anomalies.

Use the StackState CLI to ready to send to StackState.

The AAD StackPack can only be installed within a . Please make sure that this is supported by your StackState installation.

If you aren't sure that you have a Kubernetes setup or would you like to know more, contact .

To install the AAD StackPack, simply press the INSTALL button. No other actions need to be taken. THe AAD requires a before it can begin to report anomalies.

The top ranking is given to metric streams with .

From those components, the metric streams with the highest priorities are ranked highest. See .

Know what the AAD is working on. provides various metrics and indicators, including details of what the AAD is currently doing.

After an initial , the AAD ensures that prioritized metric streams are checked for anomalies in a timely fashion. Anomalies occurring in the highest prioritized metric streams are detected within about 5 minutes.

Yes. The AAD itself doesn't alert on anomalies found, but can be added to components to automatically change the health status of the component to DEVIATING. This health state change event can then trigger notifications by to a view.

🧩
export anomaly feedback
Kubernetes setup
StackState support
anomaly health checks
how to set the priority for a stream
The status UI of the AAD
anomaly health checks
adding an event handler
Anomaly detection
Anomaly health checks
anomaly severity
training period
training period
Generates an anomaly event
Probable Causes for any associated problem
training period
AAD StackPack has been installed
severity
anomaly event
HIGH, MEDIUM and LOW severity anomalies
Metric stream anomaly event detailed information
Add feedback to an anomaly