Checks and telemetry streams

Overview

Checks are the mechanisms through which elements (components and relations) get a health state. The state of an element is determined from data in the associated telemetry streams.

Checks

Checks determine the health state of an element by monitoring one or more telemetry streams. Each telemetry stream supplies either metrics (time-series) or events (logs and events) data.

Check Functions

StackState checks are based on check functions - reusable, user defined scripts that specify when a health state should be returned. This makes checks particularly powerful, allowing StackState to monitor any number of available telemetry streams. For example, you could write a check function to monitor:

  • Are we seeing a normal amount of hourly traffic?

  • Have there been any fatal exceptions logged?

  • What state did other systems report?

A check function receives parameter inputs and returns an output health state. Each time a check function is executed, it updates the health state of the checks it ran for. If a check function does not return a health state, the health state of the check remains unchanged.

Telemetry streams

A telemetry stream is a real-time stream of either metric or event data coming from an external monitoring system.

Data

Description

Metrics

Metric, or time-series, data are numeric values over time. A metric can represent any kind of measurement, like a count or a percentage.

Events

An event is a (JSON style) data object with some properties. Each event may represent a log entry or even some state information coming from an external system. StackState is able to synchronize the checks of external systems, like OpsView or Nagios. These systems report check changes to StackState as events. These events are then checked for their data by a check, which in turn can translate this into an element's state in StackState.

Telemetry stream providers

Telemetry streams are supplied via plugins. Different plugins provide one or multiple types of telemetry streams. For example, the Graphite plugin provides StackState with a metrics telemetry stream, while the Elasticsearch plugin provides metrics and events telemetry streams.

Add telemetry streams

In StackState, telemetry streams need to be linked to elements (components or relations). Once a telemetry stream has been linked to an element it can be used as an input for the element's checks. Telemetry streams can also be defined in templates and attached automatically to elements when they are imported by a synchronization.

Read how to add a telemetry stream to an element or how to add telemetry during topology synchronization.

Baselines

A baseline can be attached to a metric stream. The baseline consists of an average, a lowerDeviation and a higherDeviation for batches of metric data. Checks can use the baseline values on a metric stream to trigger an alert if a batch of metrics deviates from the baseline.

Read more about anomaly detection with baselines and baseline functions.

See also