LogoLogo
StackState.comDownloadSupportExplore playground
StackState v5.1
StackState v5.1
  • Welcome to the StackState docs!
  • StackState self-hosted v5.1 docs
  • Getting Started
  • 🚀Setup
    • Install StackState
      • Requirements
      • Kubernetes / OpenShift
        • Kubernetes install
        • OpenShift install
        • Required Permissions
        • Non-high availability setup
        • Override default configuration
        • Configure storage
        • Configure Ingress
        • Install from custom image registry
        • Migrate from Linux install
      • Linux
        • Before you install
        • Download
        • Install StackState
        • Install with production configuration
        • Install with development configuration
        • Install with POC configuration
        • Set up a reverse proxy
        • Set up TLS without reverse proxy
      • Initial run guide
      • Troubleshooting
    • Upgrade StackState
      • Steps to upgrade
      • Version specific upgrade instructions
      • StackPack versions
      • StackState release notes
    • StackState Agent
      • About StackState Agent V3
      • Docker
      • Kubernetes / OpenShift
      • Linux
      • Windows
      • Advanced Agent configuration
      • Use an HTTP/HTTPS proxy
      • Agent V1 (legacy)
      • Migrate Agent V1 to Agent V2
        • Linux
        • Docker
    • StackState CLI
      • CLI: sts
      • CLI: stac (deprecated)
      • Comparison between CLIs
    • Data management
      • Backup and Restore
        • Kubernetes backup
        • Linux backup
        • Configuration backup
      • Data retention
      • Clear stored data
  • 👤Use
    • Concepts
      • The 4T data model
      • Components
      • Relations
      • Health state
      • Layers, Domains and Environments
      • Perspectives
      • Anomaly detection
      • StackState architecture
    • StackState UI
      • Explore mode
      • Filters
      • Views
        • About views
        • Configure the view health
        • Create and edit views
        • Visualization settings
      • Perspectives
        • Topology Perspective
        • Events Perspective
        • Traces Perspective
        • Metrics Perspective
      • Timeline and time travel
      • Analytics
      • Keyboard shortcuts
    • Checks and monitors
      • Checks
      • Add a health check
      • Anomaly health checks
      • Monitors
      • Manage monitors
    • Problem analysis
      • About problems
      • Problem lifecycle
      • Investigate a problem
      • Problem notifications
    • Metrics
      • Telemetry streams
      • Golden signals
      • Top metrics
      • Add a telemetry stream
      • Browse telemetry
      • Set telemetry stream priority
    • Events
      • About events
      • Event notifications
      • Manage event handlers
    • Glossary
  • 🧩StackPacks
    • About StackPacks
    • Add-ons
      • Autonomous Anomaly Detector
      • Health Forecast
    • Integrations
      • About integrations
      • 💠StackState Agent V2
      • 💠AWS
        • AWS
        • AWS ECS
        • AWS X-ray
        • StackState/Agent IAM role: EC2
        • StackState/Agent IAM role: EKS
        • Policies for AWS
        • AWS (legacy)
        • Migrate AWS (legacy) to AWS
      • 💠Dynatrace
      • 💠Kubernetes
      • 💠OpenShift
      • 💠OpenTelemetry
        • About instrumentations
        • AWS NodeJS Instrumentation
        • Manual Instrumentation
          • Prerequisites
          • Tracer and span mappings
          • Relations between components
          • Span health state
          • Merging components
          • Code examples
      • 💠ServiceNow
      • 💠Slack
      • 💠Splunk
        • Splunk
        • Splunk Events
        • Splunk Health
        • Splunk Metrics
        • Splunk Topology
      • 💠VMWare vSphere
      • Apache Tomcat
      • Azure
      • Cloudera
      • Custom Synchronization
      • DotNet APM
      • Elasticsearch
      • Humio
      • Java APM
      • JMX
      • Logz.io
      • MySQL
      • Nagios
      • OpenMetrics
      • PostgreSQL
      • Prometheus
      • SAP
      • SCOM
      • SolarWinds
      • Static Health
      • Static Topology
      • Traefik
      • WMI
      • Zabbix
    • Develop your own StackPacks
  • 🔧Configure
    • Topology
      • Component actions
      • Identifiers
      • Topology naming guide
      • Topology sources
      • Create a topology manually
      • Configure topology synchronizations
      • Enable email event notifications
      • Send topology data over HTTP
      • Set the topology filtering limit
      • Use a proxy for event handlers
      • Use tags
      • Tune topology synchronization
      • Debug topology synchronization
    • Telemetry
      • Add telemetry during topology synchronization
      • Data sources
        • Elasticsearch
        • Prometheus mirror
      • Send events over HTTP
      • Send metrics data over HTTP
      • Set the default telemetry interval
      • Debug telemetry synchronization
    • Traces
      • Set up traces
      • Advanced configuration for traces
    • Health
      • Health synchronization
      • Send health data over HTTP
        • Send health data
        • Repeat Snapshots JSON
        • Repeat States JSON
        • Transactional Increments JSON
      • Debug health synchronization
    • Anomaly Detection
      • Export anomaly feedback
      • Scale the AAD up and down
      • The AAD status UI
    • Security
      • Authentication
        • Authentication options
        • File based
        • LDAP
        • Open ID Connect (OIDC)
        • KeyCloak
        • Service tokens
      • RBAC
        • Role-based Access Control
        • Permissions
        • Roles
        • Scopes
        • Subjects
      • Secrets management
      • Self-signed certificates
      • Set up a security backend for Linux
      • Set up a security backend for Windows
    • Logging
      • Kubernetes logs
      • Linux logs
      • Enable logging for functions
  • 📖Develop
    • Developer guides
      • Agent checks
        • About Agent checks
        • Agent check API
        • Agent check state
        • How to develop Agent checks
        • Connect an Agent check to StackState
      • Custom functions and scripts
        • StackState functions
        • Check functions
        • Component actions
        • Event handler functions
        • ID extractor functions
        • Mapping functions
        • Monitor functions
        • Propagation functions
        • Template functions
        • View health state configuration functions
      • Custom Synchronization StackPack
        • About the Custom Synchronization StackPack
        • How to customize elements created by the Custom Synchronization StackPack
        • How to configure a custom synchronization
      • Integrate external services
      • Mirroring Telemetry
      • Monitors
        • Create monitors
        • Monitor STJ file format
      • StackPack development
        • How to create a StackPack
        • Packaging
        • How to get a template file
        • How to make a multi-instance StackPack
        • Prepare a multi-instance provisioning script
        • Upload a StackPack file
        • Prepare a shared template
        • Customize a StackPack
        • Prepare instance template files
        • Prepare a StackPack provisioning script
        • Resources in a StackPack
        • StackState Common Layer
      • Synchronizations and templated files
    • Reference
      • StackState OpenAPI docs
      • StackState Template JSON (STJ)
        • Using STJ
        • Template functions
      • StackState Markup Language (STML)
        • Using STML
        • STML Tags
      • StackState Query Language (STQL)
      • StackState Scripting Language (STSL)
        • Scripting in StackState
        • Script result: Async
        • Script result: Streaming
        • Time in scripts
        • Script APIs
          • Async - script API
          • Component - script API
          • HTTP - script API
          • Prediction - script API
          • StackPack - script API
          • Telemetry - script API
          • Time - script API
          • Topology - script API
          • UI - script API
          • View - script API
    • Tutorials
      • Create a simple StackPack
      • Push data to StackState from an external system
      • Send events to StackState from an external system
      • Set up a mirror to pull telemetry data from an external system
Powered by GitBook
LogoLogo

Legal notices

  • Privacy
  • Cookies
  • Responsible disclosure
  • SOC 2/SOC 3
On this page
  • Overview
  • STJ file format
  • Field information
  • identifier
  • function
  • arguments
  • status
  • intervalSeconds
  • Add scripts and queries to STJ
  • See also
  1. Develop
  2. Developer guides
  3. Monitors

Monitor STJ file format

StackState Self-hosted v5.1.x

PreviousCreate monitorsNextStackPack development

Last updated 2 years ago

Overview

Monitors can be attached to any number of elements in the StackState topology to calculate a health state based on 4T data. Each monitor consists of a monitor definition and a monitor function. Monitors are created and managed by StackPacks, you can also create custom monitors and monitor functions outside of a StackPack without having to modify any configuration.

STJ file format

Monitors in StackState are represented textually using the . The following snippet presents an example monitor file:

{
  "_version": "1.0.39",
  "timestamp": "2022-05-23T13:16:27.369269Z[GMT]",
  "nodes": [
    {
      "_type": "Monitor",
      "name": "CPU Usage",
      "description": "A simple CPU-usage monitor. If the metric is above a given threshold, the state is set to CRITICAL.",
      "identifier": "urn:system:default:monitor:cpu-usage",
      "remediationHint": "Turn it off and on again.",
      "function": {{ get "urn:system:default:monitor-function:metric-above-threshold" }},
      "arguments": [{
        "_type": "ArgumentDoubleVal",
        "parameter": {{ get "urn:system:default:monitor-function:metric-above-threshold" "Type=Parameter;Name=threshold" }},
        "value": 90.0
      }, {
         "_type": "ArgumentStringVal",
        "parameter": {{ get "urn:system:default:monitor-function:metric-above-threshold" "Type=Parameter;Name=topologyIdentifierPattern" }},
        "value": "urn:host:/${tags.host}"
      }, {
        "_type": "ArgumentScriptMetricQueryVal",
        "parameter": {{ get "urn:system:default:monitor-function:metric-above-threshold" "Type=Parameter;Name=query" }},
        "script": "Telemetry\n.query('StackState Metrics', '')\n.metricField('system.cpu.system')\n.groupBy('tags.host')\n.start('-1m')\n.aggregation('mean', '15s')"
      }],
      "status": "ENABLED",
      "tags": ["demo"],
      "intervalSeconds": 60
    }
  ]
}

In addition to the usual elements of an STJ file, the protocol version and timestamp, the snippet defines a single node of type Monitor.

The supported fields are:

  • name - a human-readable name that shortly describes the operating principle of the monitor.

  • description - a longer, more in-depth description of the monitor.

  • remediationHint - a short, markdown-enabled hint displayed whenever the validation rule represented by this monitor triggers and results in an unhealthy state.

  • status - either ENABLED|DISABLED. Dictates if the monitor will be running and producing health states. Optional. If not specified, the previous status will be used (DISABLED for newly created monitors).

  • tags - tags associated to the monitor.

Field information

identifier

An important field of the monitor node is the identifier - it's a unique value of the StackState URN format that can be used together with the monitor-specific StackState CLI commands. The identifier should be formatted as follows:

urn : <prefix> : monitor : <unique-monitor-identification>

  • The <unique-monitor-identification> is user-definable and free-form.

function

Each monitor configured in StackState uses a monitor function to compute the health state results that are attached to the elements.

Monitor functions are scripts that accept 4T data as input, check the data based on some internal logic and output health state mappings for the affected topology elements. The function is run periodically by the monitor runner (at the configured intervalSeconds). The monitor function is responsible for detecting any changes in the data that can be considered to change an element's health state.

You can list the available monitor functions using the CLI command:

sts settings list --type MonitorFunction

From StackState v5.0, the old sts CLI has been renamed to stac and there is a new sts CLI. The command(s) provided here are for use with the new sts CLI.

stac graph list MonitorFunction

⚠️ From StackState v5.0, the old sts CLI is called stac. The old CLI is now deprecated.

The new sts CLI replaces the stac CLI. It's advised to install the new sts CLI and upgrade any installed instance of the old sts CLI to stac. For details see:

arguments

The parameter binding syntax is common for all parameter types, and utilizes the following format:

{
  "_type": "<type-of-the-parameter",
  "parameter": {{ get "<identifier-of-the-function>" "Type=Parameter;Name=<name-of-the-parameter>" }},
  "value": "<value-of-the-parameter>"
}
  • _type - The type of the parameter.

  • parameter - A reference to the concrete instance of a parameter within a function's parameter list. The Name must match the name specified in the monitor function.

  • value - the value of the parameter to pass to the monitor function.

During an invocation of a monitor function, the parameter value is interpreted and instantiated beforehand with all of the requisite validations applied to it. Assuming it passes type and value validations, it will become available in the body of the function as a global value of the same name, with the assigned value.

  • Parameters marked as required in the monitor function STJ definition must be supplied at least once. If a parameter is not required, then it can be optionally omitted.

  • Parameters marked as multiple in the monitor function STJ definition can be supplied more than once, meaning that they represent a set of values.

Common parameters

Descriptions of parameters that are commonly used by monitor functions can be found below:

Numeric values

The most common and simple monitor function parameter types are numeric values.

To supply a value to the value parameter defined in the monitor function, the monitor STJ definition would look something like the following:

...
{
  "_type": "ArgumentDoubleVal",
  "parameter": {{ get "<identifier-of-the-function>" "Type=Parameter;Name=value" }},
  "value": 23.5
}
...

The declaration of a numeric value in a monitor function STJ definition can look something like the following:

...
"parameters": [{
  "_type": "Parameter",
  "type": "DOUBLE",
  "name": "value",
  "required": true,
  "multiple": false
  },
  ...
]
...

Topology Query

To supply a value to the topologyQuery parameter defined in the monitor function, the monitor STJ definition would look something like the following:

...
{
  "_type": "ArgumentStringVal",
  "parameter": {{ get "<identifier-of-the-function>" "Type=Parameter;Name=topologyQuery" }},
  "value": "type = 'database' OR type = 'database-shard'"
}
...

The declaration of a topology query in a monitor function STJ definition can look something like the following:

...
"parameters": [{
  "_type": "Parameter",
  "type": "STRING",
  "name": "topologyQuery",
  "required": true,
  "multiple": false
  },
  ...
]
...

Telemetry query

Monitor functions that utilize telemetry tend to be parameterized with the exact telemetry query to use for their computation. The telemetry query should be built using the StackState Telemetry Script API. The following fields are particularly useful in telemetry queries that are passed to monitor functions:

  • groupBy(fields) - when a monitor will produce a health state for multiple components, use the groupBy field to produce multiple time series as a set of unique values for the defined fields.

  • aggregation(type, interval) - aggregates each time series by the defined type. Each aggregated value is constructed out of a data span the size of the defined interval.

To supply a value to the telemetryQuery parameter defined in the monitor function, the monitor STJ definition would look something like the following. Note that the provided value must utilize the StackState Telemetry Script API and evaluate to a telemetry query, otherwise it won't pass the argument validation that is performed before the function execution begins.

...
{
  "_type": "ArgumentScriptMetricQueryVal",
  "parameter": {{ get "<identifier-of-the-function>" "Type=Parameter;Name=telemetryQuery" }},
  "value": "Telemetry.query('StackState Metrics', '').metricField('system.cpu.iowait').groupBy('tags.host').start('-10m').aggregation('mean', '1m')"
}
...

The declaration of a telemetry query can either expect a string value of a metric name, or a full-fledged Telemetry Query:

...
"parameters": [{
  "_type": "Parameter",
  "type": "SCRIPT_METRIC_QUERY",
  "name": "telemetryQuery",
  "required": true,
  "multiple": false
},
  ...
]
...

Topology identifier pattern

Monitor functions that don't process any topology directly still have to produce results that attach to topology elements by way of matching the topology identifier that can be found on those elements. In those cases, one can expect a function declaration to include a special parameter that represents the pattern of a topology identifier.

The topologyIdentifierPattern value supplied to the monitor function should result in a valid topology identifier once processed by the function logic. It therefore likely needs to include various escape sequences of values that will be interpolated into the resulting value by the monitor function:

...
{
  "_type": "ArgumentStringVal",
  "parameter": {{ get "<identifier-of-the-function>" "Type=Parameter;Name=topologyIdentifierPattern" }},
  "value": "urn:host:/${tags.host}"
}
...
Telemetry
  .query('StackState Metrics', '')
  .metricField('system.cpu.iowait')
  .groupBy('host', 'region')
  .start('-10m')
  .aggregation('mean', '1m')

The telemetry query above groups its results by two fields: host and region. Both of these values will be available for value interpolation of an exact topology identifier to use, and each different host and region pair can be used either individually or together to form a unique topology identifier. If the common topology identifier scheme utilized by the topology looks as follows, then the different parts of the identifier can be replaced by references to host or region:

# Example identifier as found on a topology element:
'urn:host:/eu-west-1/i-244e275aef2a83dd'

# Topology identifier pattern that matches the above example identifier:
'urn:host:/${region}/${host}'

The declaration of a topology identifier pattern would look something like the following:

...
"parameters": [{
  "_type": "Parameter",
  "type": "STRING",
  "name": "topologyIdentifierPattern",
  "required": true,
  "multiple": false
  },
  ...
]
...

status

A monitor with an ENABLED status will be automatically executed and its results will be persisted. A DISABLED monitor is still available for a dry-run to inspect its results and execution (helpful for debugging a monitor). When a monitor is initially created it will start with a DISABLED status, unless the status field is present in the payload. When a monitor is updated, it will keep its own status, unless the status is specified. If the status field is included in the payload, the monitor will assume the specified status.

When a monitor is disabled, all health states associated with the monitor will be removed, and they will no longer be visible in the StackState UI. Disabling a monitor is quite useful to debug and fix execution errors without having the monitor produce health states or errors. A disabled monitor can still be used to do a dry-run.

intervalSeconds

The monitor run interval determines how often a monitor logic will be executed. This is configured in the monitor STJ file as a number of seconds using the intervalSeconds field. For example, an intervalSeconds: 60 configuration means that StackState will attempt to execute the monitor function associated with the monitor every 60 seconds. If the monitor function execution takes significant time, the next scheduled run will occur 60 seconds after the previous run finishes.

Add scripts and queries to STJ

A monitor STJ file and an STJ monitor function definition contain the following script and queries:

  • The property script of type ScriptFunctionBody in the monitor function definition provides a groovy script that is run by the monitor function.

For example:

yq -P ./monitor.stj > monitor.yaml

Obtains something like the following:

_version: 1.0.39
timestamp: 2022-05-23T13:16:27.369269Z[GMT]
nodes:
  - _type: Monitor
    name: CPU Usage
    description: A simple CPU-usage monitor. If the metric is above a given threshold, the state is set to CRITICAL.
    identifier: urn:system:default:monitor:cpu-usage
    remediationHint: Turn it off and on again.
    function:? get "urn:system:default:monitor-function:metric-above-threshold"::
    arguments:
      - _type: ArgumentDoubleVal
        parameter:? get "urn:system:default:monitor-function:metric-above-threshold" "Type=Parameter;Name=threshold"::
        value: 90.0
      - _type: ArgumentStringVal
        parameter:? get "urn:system:default:monitor-function:metric-above-threshold" "Type=Parameter;Name=topologyIdentifierPattern"::
        value: urn:host:/${tags.host}
      - _type: ArgumentScriptMetricQueryVal
        parameter:? get "urn:system:default:monitor-function:metric-above-threshold" "Type=Parameter;Name=metrics"::
        script: |-
          Telemetry
          .query("StackState Metrics", "")
          .metricField("system.cpu.system")
          .groupBy("tags.host")
          .start("-1m")
          .aggregation("mean", "15s")
    intervalSeconds: 60

Here the ArgumentScriptMetricQueryVal script (query) is readable and more easily editable in a YAML representation of the monitor.

After the script, or any other field, has been edited in the YAML representation, you can go back to the STJ representation using:

yq -o=json '.' monitor.yaml

StackState self-hosted only

yq -P ./monitorFunction.stj > monitorFunction.yaml

This will obtain:

_version: 1.0.39
timestamp: 2022-06-23T23:23:23.269369Z[GMT]
nodes:
  - _type: MonitorFunction
    name: Metric above threshold
    description: Validates that a metric value stays below a given threshold, reports a CRITICAL state otherwise.
    parameters:
      - _type: Parameter
        name: threshold
        type: DOUBLE
        required: true
        multiple: false
      - _type: Parameter
        name: topologyIdentifierPattern
        type: STRING
        required: true
        multiple: false
      - _type: Parameter
        name: metrics
        type: SCRIPT_METRIC_QUERY
        required: true
        multiple: false
    identifier: urn:system:default:monitor-function:metric-above-threshold
    script:
      _type: ScriptFunctionBody
      scriptBody: |-
        def checkThreshold(timeSeries, threshold) {
          timeSeries.points.any { point -> point.last() > threshold }
        }

        metrics.map { result ->
          def state = "CLEAR"
          if (checkThreshold(result.timeSeries, threshold)) {
            state = "CRITICAL";
          }

          return [
            _type: "MonitorHealthState",
            id: result.timeSeries.id.toIdentifierString(),
            state: state,
            topologyIdentifier: StringTemplate.runForTimeSeriesId(topologyIdentifierPattern, result.timeSeries.id)
            displayTimeSeries: [
              [
                _type: "DisplayTimeSeries",
                name: "The resulting metric values",
                query: result.query,
                timeSeriesId: result.timeSeries.id
              ]
            ]
          ]
        }

The script is now readable and easier to edit. After editing the script, or any other field, of our monitor function in the YAML representation, we could go back to the STJ representation using:

yq -o=json '.' monitorFunction.yaml

This can then be added back to the property script of type ScriptFunctionBody.

See also

identifier - a StackState-URN-formatted value that uniquely identifies this monitor definition. For more details see .

function - the specific monitor function to use as the basis of computation for this monitor. For more details see .

arguments - lists concrete values that are to be used for parameters in the monitor function invocation. For more details and descriptions of commonly used parameters, see .

intervalSeconds - dictates how often to execute this particular monitor; new executions are scheduled after the specified number of seconds, counting from the time that the last execution ended. For more details see .

The <prefix> is described in more detail in .

➡️

You can to customize how StackState processes 4T data.

The arguments defined in the monitor STJ definition should match the parameters defined in the monitor function STJ definition. See below for examples of .

- a simple numeric value.

- a query to return a subset of the topology.

- a query that returns the telemetry to be passed to the monitor function.

- the pattern of the topology element identifiers to which the monitor function should assign calculated health states.

Monitor functions that utilize Topology often times take a Topology Query as a parameter. An external tool can be used to allow you to easily .

➡️

The exact value to use for this parameter depends on the topology available in StackState (or more precisely on its identifier scheme), and on the values supplied by the monitor function for interpolation (or more precisely the type of data processed by the function). In the most common case, a topology identifier pattern parameter is used in conjunction with a - in this case, the fields used for the telemetry query grouping (listed in its .groupBy() step) will also be available for the interpolation of topology identifier values. For example, consider the following query:

in the monitor STJ file define a telemetry query to be used by the monitor function.

For details of the script property, see the page .

It can be challenging to add scripts and queries to the STJ format. An external tool, such as , can be used to get a more friendly formatting of the script or query to work with and update as required.

Update a query defined in ArgumentScriptMetricQueryVal for a monitor using the external tool to get a more friendly formatting:

Update a monitor function using the external tool to get a more friendly formatting:

This uses the example monitor function shown on the page

📖
STJ file format
topology identifiers
Comparison between the CLIs
create custom monitor function
Learn more about the Telemetry script API
yq (github.com)
yq (github.com)
yq (github.com)
Create a custom monitor
Monitor functions
Manage monitors
STJ reference
identifier
function
arguments
run interval
how to set commonly used parameters
Numeric values
Topology query
Telemetry query
Topology identifier pattern
work with queries in YAML format and add these to a monitor file in STJ format
telemetry query parameter
Arguments of type ArgumentScriptMetricQueryVal
Check which version of the sts CLI you are running
Which version of the sts CLI am I running?
Install the new sts CLI and upgrade the old sts CLI to stac
monitor functions
monitor functions