Anomaly Detection with Baselines

How to configure anomaly detection with baselines.

This page describes StackState version 4.0.

The StackState 4.0 version range is End of Life (EOL) and no longer supported. We encourage customers still running the 4.0 version range to upgrade to a more recent release.

Go to the documentation for the latest StackState release.

Baselines are a way to detect anomalies in metric streams. Generally speaking, an anomaly is detected when a metric stream exceeds its baseline boundaries. A baseline consists of a lower and upper boundary. It forms a band that the metric, under normal conditions, is expected to remain inside of. Baselines are initially derived from historical data, but continuously update as new data flows in. Thus when an anomaly occurs, the baseline gradually updates to take the anomaly into account.

How baselining works

The process for detecting anomalies using baselines consists out of two steps:

  1. A baseline enriches a metric stream with a baseline. The metric stream is transformed into a baseline metric stream. Baselines are continuously calculated by baseline functions based on given the batch size.

  2. A check determines the health state of a component or relation based on the metrics in the metric stream and its baseline. Once a metric stream is a baseline metric stream, check functions that support such baseline metric streams are available for selection.

Configuring a baseline for a metric stream

To configure a baseline for a metric stream go to the metric stream on a component or relation and select "add baseline" from the metric stream context menu (accessed through the triple dots next to the name of the metric stream).

Baseline dialog

In the baseline dialog fill in the following values:

Preview

Below the horizontal line you can run a preview of the baseline, so you may tune it to your liking. Select a time range for the preview and press the preview button.

Baseline functions

Baseline functions are configurable in StackState and can be coded in the StackState Scripting Language. By default the following baseline functions are supplied:

Function: Stationary Auto-Tuned Baseline

This is always a good default choice; it works well for stationary as well as seasonal metrics.

This baseline functions works well for stationary metrics (e.g. data center temperature, average response time, error count). Under the hood it uses the Exponential Weighted Moving Average (EWMA) algorithm, but auto-tunes that algorithm itself.

Pros:

  • Works reasonably well for stationary as well as seasonal metrics given any distribution of the metric stream.

  • Requires no knowledge of the underlying algorithm.

Cons:

  • Does not assume seasonality. It does not assume yesterday looks similar to today or that any such seasonal patterns should occur.

  • Very little control.

Arguments:

Function: Median Absolute Deviation

This baseline functions work well for seasonal metrics (e.g. logged in user count, online orders placed per minute). It assumes that the metrics of the last days or last weeks (fundamental period) are similar to those of today or this week.

When to choose?

When dealing with metric streams which are seasonal either by day or week. It also works reasonably well for stationary metrics, but it is not the recommended baseline function for stationary metrics.

Pros:

  • Works reasonably well for seasonal as well as stationary metric streams.

Cons:

  • Assumes daily or weekly patterns. If such patterns are not there this algorithm will not produce good results.

  • Assumes the data is normally distributed.

  • You have to specify the fundamental period yourself instead of that being auto detected.

  • For weekly patterns requires a lot of data.

Arguments:

Function: Stationary Customizable Baseline based on EWMA

This baseline functions works well for stationary metrics (e.g. data center temperature, average reponse time, error count). It uses the Exponential Weighted Moving Average (EWMA) algorithm. It is the same as the Stationary Auto-Tuned Baseline, but leaves the tuning up to you.

Pros:

  • Works well for seasonal as well as stationary metric streams given any distribution of the metric stream.

  • Provide a lot of controls for tuning.

Cons:

  • Tuning requires knowledge of both the algorithm as well as domain knowledge about the metrics stream.

  • Does not assume seasonality. It does not assume yesterday looks similar to today or that any such seasonal patterns should occur.

Arguments:

Checking for anomalies on a baseline metric stream

Once you've added a baseline to a metric stream and you see the baseline bounds drawn on top the metric stream chart you can now configure a check to alert on anomalies.

  1. On the component/relation details pane with the baseline metric stream on it, click on the "add" button next to "health", so as to create a new health check.

  2. Select the Detect anomaly by checking if the metric values are within upper and lower deviation bounds check function.

  3. Select the baseline metric stream you want to check for anomalies.

  4. Select a critical and deviating value. The values are floating point values that indicate with what factor how far the metric stream may exceed the baseline. For example:

    • deviatingValue = 1.0 - if the metric exceeds the baseline that the check will go to the DEVIATNG health state.

    • criticalValue = 1.25 - if the metric exceeds the baseline by 125% that the check will go to the CRITICAL health state.

  5. Click create to add the check.

Once you've added the check function it may take 5 or more minutes (dependent on the baseline batch size) before the check changes health state.

Alerting on checks based on baseline metric streams works exactly the same as with other checks.

Last updated