LogoLogo
StackState.comDownloadSupportExplore playground
StackState v5.1
StackState v5.1
  • Welcome to the StackState docs!
  • StackState self-hosted v5.1 docs
  • Getting Started
  • 🚀Setup
    • Install StackState
      • Requirements
      • Kubernetes / OpenShift
        • Kubernetes install
        • OpenShift install
        • Required Permissions
        • Non-high availability setup
        • Override default configuration
        • Configure storage
        • Configure Ingress
        • Install from custom image registry
        • Migrate from Linux install
      • Linux
        • Before you install
        • Download
        • Install StackState
        • Install with production configuration
        • Install with development configuration
        • Install with POC configuration
        • Set up a reverse proxy
        • Set up TLS without reverse proxy
      • Initial run guide
      • Troubleshooting
    • Upgrade StackState
      • Steps to upgrade
      • Version specific upgrade instructions
      • StackPack versions
      • StackState release notes
    • StackState Agent
      • About StackState Agent V3
      • Docker
      • Kubernetes / OpenShift
      • Linux
      • Windows
      • Advanced Agent configuration
      • Use an HTTP/HTTPS proxy
      • Agent V1 (legacy)
      • Migrate Agent V1 to Agent V2
        • Linux
        • Docker
    • StackState CLI
      • CLI: sts
      • CLI: stac (deprecated)
      • Comparison between CLIs
    • Data management
      • Backup and Restore
        • Kubernetes backup
        • Linux backup
        • Configuration backup
      • Data retention
      • Clear stored data
  • 👤Use
    • Concepts
      • The 4T data model
      • Components
      • Relations
      • Health state
      • Layers, Domains and Environments
      • Perspectives
      • Anomaly detection
      • StackState architecture
    • StackState UI
      • Explore mode
      • Filters
      • Views
        • About views
        • Configure the view health
        • Create and edit views
        • Visualization settings
      • Perspectives
        • Topology Perspective
        • Events Perspective
        • Traces Perspective
        • Metrics Perspective
      • Timeline and time travel
      • Analytics
      • Keyboard shortcuts
    • Checks and monitors
      • Checks
      • Add a health check
      • Anomaly health checks
      • Monitors
      • Manage monitors
    • Problem analysis
      • About problems
      • Problem lifecycle
      • Investigate a problem
      • Problem notifications
    • Metrics
      • Telemetry streams
      • Golden signals
      • Top metrics
      • Add a telemetry stream
      • Browse telemetry
      • Set telemetry stream priority
    • Events
      • About events
      • Event notifications
      • Manage event handlers
    • Glossary
  • 🧩StackPacks
    • About StackPacks
    • Add-ons
      • Autonomous Anomaly Detector
      • Health Forecast
    • Integrations
      • About integrations
      • 💠StackState Agent V2
      • 💠AWS
        • AWS
        • AWS ECS
        • AWS X-ray
        • StackState/Agent IAM role: EC2
        • StackState/Agent IAM role: EKS
        • Policies for AWS
        • AWS (legacy)
        • Migrate AWS (legacy) to AWS
      • 💠Dynatrace
      • 💠Kubernetes
      • 💠OpenShift
      • 💠OpenTelemetry
        • About instrumentations
        • AWS NodeJS Instrumentation
        • Manual Instrumentation
          • Prerequisites
          • Tracer and span mappings
          • Relations between components
          • Span health state
          • Merging components
          • Code examples
      • 💠ServiceNow
      • 💠Slack
      • 💠Splunk
        • Splunk
        • Splunk Events
        • Splunk Health
        • Splunk Metrics
        • Splunk Topology
      • 💠VMWare vSphere
      • Apache Tomcat
      • Azure
      • Cloudera
      • Custom Synchronization
      • DotNet APM
      • Elasticsearch
      • Humio
      • Java APM
      • JMX
      • Logz.io
      • MySQL
      • Nagios
      • OpenMetrics
      • PostgreSQL
      • Prometheus
      • SAP
      • SCOM
      • SolarWinds
      • Static Health
      • Static Topology
      • Traefik
      • WMI
      • Zabbix
    • Develop your own StackPacks
  • 🔧Configure
    • Topology
      • Component actions
      • Identifiers
      • Topology naming guide
      • Topology sources
      • Create a topology manually
      • Configure topology synchronizations
      • Enable email event notifications
      • Send topology data over HTTP
      • Set the topology filtering limit
      • Use a proxy for event handlers
      • Use tags
      • Tune topology synchronization
      • Debug topology synchronization
    • Telemetry
      • Add telemetry during topology synchronization
      • Data sources
        • Elasticsearch
        • Prometheus mirror
      • Send events over HTTP
      • Send metrics data over HTTP
      • Set the default telemetry interval
      • Debug telemetry synchronization
    • Traces
      • Set up traces
      • Advanced configuration for traces
    • Health
      • Health synchronization
      • Send health data over HTTP
        • Send health data
        • Repeat Snapshots JSON
        • Repeat States JSON
        • Transactional Increments JSON
      • Debug health synchronization
    • Anomaly Detection
      • Export anomaly feedback
      • Scale the AAD up and down
      • The AAD status UI
    • Security
      • Authentication
        • Authentication options
        • File based
        • LDAP
        • Open ID Connect (OIDC)
        • KeyCloak
        • Service tokens
      • RBAC
        • Role-based Access Control
        • Permissions
        • Roles
        • Scopes
        • Subjects
      • Secrets management
      • Self-signed certificates
      • Set up a security backend for Linux
      • Set up a security backend for Windows
    • Logging
      • Kubernetes logs
      • Linux logs
      • Enable logging for functions
  • 📖Develop
    • Developer guides
      • Agent checks
        • About Agent checks
        • Agent check API
        • Agent check state
        • How to develop Agent checks
        • Connect an Agent check to StackState
      • Custom functions and scripts
        • StackState functions
        • Check functions
        • Component actions
        • Event handler functions
        • ID extractor functions
        • Mapping functions
        • Monitor functions
        • Propagation functions
        • Template functions
        • View health state configuration functions
      • Custom Synchronization StackPack
        • About the Custom Synchronization StackPack
        • How to customize elements created by the Custom Synchronization StackPack
        • How to configure a custom synchronization
      • Integrate external services
      • Mirroring Telemetry
      • Monitors
        • Create monitors
        • Monitor STJ file format
      • StackPack development
        • How to create a StackPack
        • Packaging
        • How to get a template file
        • How to make a multi-instance StackPack
        • Prepare a multi-instance provisioning script
        • Upload a StackPack file
        • Prepare a shared template
        • Customize a StackPack
        • Prepare instance template files
        • Prepare a StackPack provisioning script
        • Resources in a StackPack
        • StackState Common Layer
      • Synchronizations and templated files
    • Reference
      • StackState OpenAPI docs
      • StackState Template JSON (STJ)
        • Using STJ
        • Template functions
      • StackState Markup Language (STML)
        • Using STML
        • STML Tags
      • StackState Query Language (STQL)
      • StackState Scripting Language (STSL)
        • Scripting in StackState
        • Script result: Async
        • Script result: Streaming
        • Time in scripts
        • Script APIs
          • Async - script API
          • Component - script API
          • HTTP - script API
          • Prediction - script API
          • StackPack - script API
          • Telemetry - script API
          • Time - script API
          • Topology - script API
          • UI - script API
          • View - script API
    • Tutorials
      • Create a simple StackPack
      • Push data to StackState from an external system
      • Send events to StackState from an external system
      • Set up a mirror to pull telemetry data from an external system
Powered by GitBook
LogoLogo

Legal notices

  • Privacy
  • Cookies
  • Responsible disclosure
  • SOC 2/SOC 3
On this page
  • Overview
  • Agent checks
  • Agent Check V2 (Agent 2.18+)
  • StatefulAgentCheck (Agent 2.18+)
  • TransactionalAgentCheck (Agent 2.18+)
  • Agent Check (To be deprecated)
  • Agent Checks (all)
  • Scheduling
  • Send data
  • Topology
  • Metrics
  • Events
  • Status (Agent Check only)
  • Health
  • Checks and streams
  • Logging
  • Error handling
  • See also
  1. Develop
  2. Developer guides
  3. Agent checks

Agent check API

StackState Self-hosted v5.1.x

PreviousAbout Agent checksNextAgent check state

Last updated 2 years ago

Overview

The Agent check API can be used to create checks that run on the . This page explains how to work with the Agent check API to write checks that send topology, metrics, events and service status information to StackState.

Code examples for the open source StackState Agent checks can be found on GitHub at: .

Agent checks

From Agent 2.18, we have introduced which has some key difference to historic Agent Checks. The key differences being:

  • V2 Agent Check checks requires a return value in the form of a

  • V2 Agent Check includes two new check base classes:

Agent Check V2 (Agent 2.18+)

An Agent Check is a Python class that inherits from AgentCheckV2 and implements the check method:

from stackstate_checks.base.checks.v2.base import AgentCheckV2
from stackstate_checks.checks import CheckResponse

class MyCheck(AgentCheckV2):
    def check(self, instance): # type: (InstanceType) -> CheckResponse
        # Collect metrics and topologies, events, return CheckResponse
        return CheckResponse()

    def get_instance_key(self, instance):
        # Provide an identifier (TopologyInstance)

Error Handling

In the event of a check error, the exception should be returned as part of the check response:

from stackstate_checks.base.checks.v2.base import AgentCheckV2
from stackstate_checks.checks import CheckResponse

class MyCheck(AgentCheckV2):
    def check(self, instance): # type: (InstanceType) -> CheckResponse
        # Collect metrics and topologies, events, return CheckResponse
        try:
          this_triggers_an_exception()
        except Exception as e:
          return CheckResponse(check_error=e)

    def get_instance_key(self, instance):
        # Provide an identifier (TopologyInstance)

StatefulAgentCheck (Agent 2.18+)

An Stateful Agent Check is a Python class that inherits from StatefulAgentCheck and implements the stateful_check method. This is intended to be used for Agent checks that requires the ability to persist data across check runs and be available in the event of Agent failure. If an Agent failure occurs, the persisted state will be used in the next check run. Persistent state is persisted even in the event of check failure. The StatefulAgentCheck receives the current persistent state as an input parameter. The persistent_state parameter of the CheckResponse return type is then set as the new persistent state value.

from stackstate_checks.base.checks.v2.stateful_agent_check import StatefulAgentCheck
from stackstate_checks.checks import CheckResponse

class MyCheck(StatefulAgentCheck):
    def stateful_check(self, instance, persistent_state): # type: (InstanceType, StateType) -> CheckResponse
        # Collect metrics and topologies, events, return CheckResponse
        return CheckResponse(persistent_state=persistent_state)

    def get_instance_key(self, instance):
        # Provide an identifier (TopologyInstance)

TransactionalAgentCheck (Agent 2.18+)

An Transactional Agent Check is a Python class that inherits from TransactionalAgentCheck and implements the transactional_check method. This is intended to be used for Agent checks that require transactional behavior for updating it's state. A Agent Check transaction is considered a success if the data submitted by the Agent Check reaches StackState. This enables checks to never process / submit data that has already been received by StackState. Persistent state is persisted even in the event of check failure, while transactional state is only persistent once a transaction has succeeded. The TransactionalAgentCheck receives the current transactional and persistent state as input parameters. The transactional_state and persistent_state parameters of the CheckResponse return type are then correspondingly set as the new state values.

from stackstate_checks.base.checks.v2.transactional_agent_check import TransactionalAgentCheck
from stackstate_checks.checks import CheckResponse

class MyCheck(TransactionalAgentCheck):
    def transactional_check(self, instance, transactional_state, persistent_state):
        # type: (InstanceType, StateType, StateType) -> CheckResponse
        # Collect metrics and topologies, events, return CheckResponse
        return CheckResponse(transactional_state=transactional_state,  persistent_state=persistent_state)

    def get_instance_key(self, instance):
        # Provide an identifier (TopologyInstance)

Agent Check (To be deprecated)

An Agent Check is a Python class that inherits from AgentCheck and implements the check method:

from stackstate_checks.checks import AgentCheck

class MyCheck(AgentCheck):
    def check(self, instance):
        # Collect metrics and topologies, emit events, submit service checks

    def get_instance_key(self, instance):
        # Provide an identifier (TopologyInstance)

Agent Checks (all)

The Agent creates an object of type MyCheck for each element contained in the instances sequence of the corresponding Agent Check configuration file:

instances:
  - host: localhost
    port: 6379

  - host: example.com
    port: 6379

All mapping included in the instances section of the Agent Check configuration file is passed to the check method using the declared instance value.

The AgentCheck, AgentCheckV2, StatefulAgentCheck, TransactionalAgentCheck class provides the following methods and attributes:

  • self.name - a name of the check

  • self.init_config - init_config that corresponds in the check configuration

Scheduling

Multiple instances of the same check can run concurrently. If a check is already running, it isn't necessary to schedule another one.

Send data

Topology

Topology elements can be sent to StackState with the following methods:

  • self.start_snapshot() - Start a topology snapshot for a specific topology instance source.

  • self.stop_snapshot() - Stop a topology snapshot for a specific topology instance source.

Send components

Components can be sent to StackState using the self.component(id, type, data) method.

self.component(
        "urn:example:/host:this_host", # the ID
        "Host", # the type
        data={
            "name": "this-host",
            "domain": "Webshop",
            "layer": "Machines",
            "identifiers": ["urn:host:/this-host-fqdn"],
            "labels": ["host:this_host", "region:eu-west-1"],
            "environment": "Production"
            })

The method requires the following details:

  • id - string. A unique ID for this component. This has to be unique for this instance.

  • type - string. A named parameter for this type.

  • data - dictionary. A JSON blob of arbitrary data. The fields within this object can be referenced in the ComponentTemplateFunction and the RelationTemplateFunction within StackState.

All submitted topologies are collected by StackState and flushed together with all the other Agent metrics at the end of check function.

Send relations

Relations can be sent to StackState using the self.relation(source_id, target_id, type, data) method.

self.relation(
        "nginx3.e5dda204-d1b2-11e6-a015-0242ac110005",   # source ID
        "nginx5.0df4bc1e-c695-4793-8aae-a30eba54c9d6",   # target ID
        "uses_service",  # type
        {})   # data

The method requires the following details:

  • source_id - string. The source component externalId.

  • target_id - string. The target component externalId.

  • type - string. The type of relation.

  • data - dictionary. A JSON blob of arbitrary data. The fields within this object can be referenced in the ComponentTemplateFunction and the RelationTemplateFunction within StackState.

All submitted topologies are collected by StackState and flushed together with all the other Agent metrics at the end of check function.

Metrics

Metrics can be sent to StackState with the following methods:

  • self.gauge - Sample a gauge metric.

  • self.count - Sample a raw count metric.

  • self.rate - Sample a point, with the rate calculated at the end of the check.

  • self.increment - Increment a counter metric.

  • self.decrement - Decrement a counter metric.

  • self.histogram - Sample a histogram metric.

  • self.historate - Sample a histogram based on rate metrics.

  • self.monotonic_count - Sample an increasing counter metric.

self.gauge(
        "test.metric", # the metric name
        10.0, # value of the metric
        "tags": [ 
          "tag_key1:tag_value1",
          "tag_key2:tag_value2"
          ],
        "localdocker.test") # the hostname

Each method accepts the following metric details:

  • name - the name of the metric.

  • value - the value for the metric. Defaults to 1 on increment, -1 on decrement.

  • tags - optional. A list of tags to associate with this metric.

  • hostname - optional. A hostname to associate with this metric. Defaults to the current host.

All submitted metrics are collected and flushed with all the other Agent metrics at the end of check function.

Events

Events can be sent to StackState with the self.event(event_dict) method.

self.event(  
        {
        "context": {
          "category": "Changes",
          "data": { 
            "data_key1":"data_value1",
            "data_key2":"data_value2"
          },
          "element_identifiers": [
            "element_identifier1",
            "element_identifier2"
            ],
          "source": "source_system",
          "source_links": [
            {
              "title": "link_title",
              "url": "link_url"
              }
            ]
          },    
        "event_type": "event_typeEvent",
        "msg_title": "event_title",
        "msg_text": "event_text",
        "source_type_name": "source_event_type",
        "tags": [
          "tag_key1:tag_value1",
          "tag_key2:tag_value2",
          ],
        "timestamp": 1607432944
        })

Note that msg_title and msg_text are required fields from Agent V2.11.0.

All events will be collected and flushed with the rest of the Agent payload at the end of the check function.

Status (Agent Check only)

Reporting status of a service is handled by calling the service_check method:

self.service_check(name, status, tags=None, message="")

The method can accept the following arguments:

  • name - the name of the service check

  • status - a constant describing the service status defined in the AgentCheck class:

    • AgentCheck.OK for success status.

    • AgentCheck.WARNING for failure status.

    • AgentCheck.CRITICAL for failure status.

    • AgentCheck.UNKNOWN for indeterminate status.

  • tags - a list of tags to associate with the check. (optional)

  • message - additional information about the current status. (optional)

This will be fully deprecated in Agent Check V2 in favour of the CheckResponse.

Health

Health information can be sent to StackState with the following methods:

  • self.health.check_state - send a check state as part of a snapshot.

  • self.health.start_snapshot() - start a health snapshot. Stackstate will only process health information if it's sent as part of a snapshot.

  • self.health.stop_snapshot() - stop the snapshot, signaling that all submitted data is complete. This should be done at the end of the check after all data has been submitted. If exceptions occur in the check or not all data can be produced for some other reason, this function should not be called.

Set up a health stream

To make the self.health API available, override the get_health_stream function to define a URN identifier for the health synchronization stream.

from stackstate_checks.base import AgentCheck, ConfigurationError, HealthStreamUrn, HealthStream

...

class ExampleCheck(AgentCheck):

...

    def get_health_stream(self, instance):
        if 'url' not in instance:
            raise ConfigurationError('Missing url in topology instance configuration.')
        instance_url = instance['url']
        return HealthStream(
          urn=HealthStreamUrn("example", instance_url),
          sub_stream=self.hostname
          repeat_interval_seconds=20
          expiry_seconds=60
        )

...

The HealthStream class has the following options:

  • urn - HealthStreamUrn. The stream urn under which the health information will be grouped.

  • sub_stream - string. Optional. Allows for separating disjoint data sources within a single health synchronization stream. For example, the data for the streams is reported separately from different hosts.

  • repeat_interval_seconds - integer. Optional. The interval with which data will be repeated, defaults to collection_interval (min_collection_interval for Agent V2.14.x or earlier). This allows StackState to detect when data arrives later than expected.

  • expiry_seconds - integer. Optional. The time after which all data from the stream or substream should be removed. Set to '0' to disable expiry (this is only possible when the sub_stream parameter is omitted). Default 4*repeat_interval_seconds.

Send check states

Components can be sent to StackState using the self.component(id, type, data) method.

from stackstate_checks.base import Health

...

self.health.check_state(
  check_state_id="check_state_from_example_1",
  name="Example check state",
  health_value=Health.CRITICAL,
  topology_element_identifier="urn:component/the_component_to_attach_to",
  message="Optional clarifying message"
)

The method requires the following details:

  • check_state_id - string. Uniquely identifies the check state within the (sub)stream.

  • name - string. Display name for the health check state.

  • health_value - Health. The StackState health value, can be CLEAR, DEVIATING or CRITICAL.

  • topology_element_identifier - string. The component or relation identifier that the check state should bind to. The check state will be associated with all components/relations that have the specified identifier.

  • message - string. Optional. Extended message to display with the health state. Supports Markdown.

Checks and streams

Streams and health checks can be sent to StackState together with a topology component. These can then be mapped together in StackState by a StackPack to give you telemetry streams and health states on your components.

All telemetry classes and methods can be imported from stackstate_checks.base. The following stream types can be added:

In the example below, a MetricStream is created on the metric system.cpu.usage with some conditions specific to a component. A health check (check) maximum_average is then created on this metric stream using this_host_cpu_usage.identifier. The stream and check are then added to the streams and checks list for the component this-host.

this_host_cpu_usage = MetricStream(
                              "Host CPU Usage", 
                              "system.cpu.usage",
                              conditions={
                                  "tags.hostname": "this-host", 
                                  "tags.region": "eu-west-1"
                                  },
                              unit_of_measure="Percentage",
                              aggregation="MEAN",
                              priority="HIGH")

cpu_max_average_check = MetricHealthChecks.maximum_average(
                                this_host_cpu_usage.identifier,
                                "Max CPU Usage (Average)", 
                                75, 
                                90,
                                remediation_hint="Too much activity")

self.component(
        "urn:example:/host:this_host", 
        "Host",
        data={
            "name": "this-host",
            "domain": "Webshop",
            "layer": "Machines",
            "identifiers": ["urn:host:/this-host-fqdn"],
            "labels": ["host:this_host", "region:eu-west-1"],
            "environment": "Production"
            },
        streams=[this_host_cpu_usage],
        checks=[cpu_max_average_check])

Events stream

Log streams containing events can be added to a component using the EventStream class.

EventStream(
        "Host events stream", # name
        conditions={
          "key1": "value1", 
          "key2": "value2"
          })

Each events stream has the following details:

  • name - The name for the stream in StackState.

  • conditions - A dictionary of key:value arguments that are used to filter the event values for the stream.

Event stream health check

Event stream health checks can optionally be mapped to an events stream using the stream identifier. The following event stream health checks are supported out of the box:

Event stream health check
Description

contains_key_value

Checks that the last event contains (at the top-level), the specified value for a key.

use_tag_as_health

Checks that returns the value of a tag in the event as the health state.

custom_health_check

This method provides the functionality to send in a custom event health check.

EventHealthChecks.contains_key_value(
        "this_host_events",   # stream_id
        "Events on this host",  # name
          75,   # contains_key
          90,   # contains_value
          "CRITICAL"  # health state when key found
          "CLEAR"   # health state when key not found
          remediation_hint="Bad event found!")

An event stream health check includes the details listed below. Note that a custom_health_check only requires a name and check_arguments:

  • stream_id - the identifier of the stream the check should run on.

  • name - the name the check will have in StackState.

  • description - the description for the check in StackState.

  • remediation_hint - the remediation hint to display when the check return a CRITICAL health state.

  • contains_key - for check contains_key_value only. The key that should be contained in the event.

  • contains_value - for check contains_key_value only. The value that should be contained in the event.

  • found_health_state - for check contains_key_value only. The health state to return when this tag and value is found.

  • missing_health_state - for check contains_key_value only. The health state to return when the tag/value isn't found.

  • tag_name - for check use_tag_as_health only. The key of the tag that should be used as the health state.

Metric stream

Metric streams can be added to a component using the MetricStream class.

MetricStream(
        "Host CPU Usage", # name
        "system.cpu.usage", # metricField
        conditions={
            "tags.hostname": "this-host", 
            "tags.region": "eu-west-1"
            },
        unit_of_measure="Percentage",
        aggregation="MEAN",
        priority="HIGH")

Each metric stream has the following details:

  • name - The name for the stream in StackState.

  • metricField - The name of the metric to select.

  • conditions - A dictionary of key:value arguments that are used to filter the metric values for the stream.

  • unit_of_measure - Optional. The unit of measure for the metric points, it gets appended after the stream name: name (unit_of_measure)

  • priority - Optional. The stream priority in StackState, one of NONE, LOW, MEDIUM, HIGH. HIGH priority streams are used for anomaly detection in StackState.

Metric stream health check

Metric stream health checks can optionally be mapped to a metric stream using the stream identifier. Note that some metric health checks require multiple streams for ratio calculations.

The following metric stream health checks are supported out of the box:

Metric stream health check
Description

maximum_average

Calculates the health state by comparing the average of all metric points in the time window against the configured maximum values.

maximum_last

Calculates the health state only by comparing the last value in the time window against the configured maximum values.

maximum_percentile

Calculates the health state by comparing the specified percentile of all metric points in the time window against the configured maximum values. For the median specify 50 for the percentile. The percentile parameter must be a value > 0 and <= 100.

maximum_ratio

Calculates the ratio between the values of two streams and compares it against the critical and deviating value. If the ratio is larger than the specified critical or deviating value, the corresponding health state is returned.

minimum_average

Calculates the health state by comparing the average of all metric points in the time window against the configured minimum values.

minimum_last

Calculates the health state only by comparing the last value in the time window against the configured minimum values.

minimum_percentile

Calculates the health state by comparing the specified percentile of all metric points in the time window against the configured minimum values. For the median specify 50 for the percentile. The percentile must be a value > 0 and <= 100.

failed_ratio

Calculates the ratio between the last values of two streams (one is the normal metric stream and one is the failed metric stream). This ratio is compared against the deviating or critical value.

custom_health_check

Provides the functionality to send in a custom metric health check.

MetricHealthChecks.maximum_average(
        this_host_cpu_usage.identifier, # stream_id
        "Max CPU Usage (Average)",  # name
        75,   # deviating value
        90,   # critical value
        remediation_hint="Too much activity on host")

A metric stream health check has the details listed below. Note that a custom_health_check only requires a name and check_arguments:

  • name - the name the health check will have in StackState.

  • description - the description for the health check in StackState.

  • deviating_value - the threshold at which point the check will return a DEVIATING health state.

  • critical_value - the threshold at which point the check will return a CRITICAL health state.

  • remediation_hint - the remediation hint to display when the check returns a CRITICAL health state.

  • max_window - the max window size for the metrics.

  • percentile - for maximum_percentile and minimum_percentile checks only. The percentile value to use for the calculation.

  • stream identifier(s):

    • stream_id - for maximum_percentile, maximum_last, maximum_average, minimum_average, minimum_last, minimum_percentile checks. The identifier of the stream the check should run on.

    • denominator_stream_id - for maximum_ratio checks only. The identifier of the denominator stream the check should run on.

    • numerator_stream_id - for maximum_ratio checks only. The identifier of the numerator stream the check should run on.

    • success_stream_id - for failed_ratio checks only. The identifier of the success stream this check should run on.

    • failed_stream_id - for failed_ratio checks only. The identifier of the failures stream this check should run on.

Service check stream

A Service Check stream can be added to a component using the ServiceCheckStream class. It expects a stream name and conditions for the metric telemetry query in StackState. Service Check Streams has one out of the box supported check which can be mapped using the stream identifier.

class ServiceCheckStream(TelemetryStream):
    """
    creates a service check stream definition for the component that will bind service checks in StackState for the
    conditions.
    args: `name, conditions
    `name` The name for the stream in StackState
    `conditions` is a dictionary of key -> value arguments that are used to filter the event values for the stream.
    """

class ServiceCheckHealthChecks(object):

    def service_check_health(stream_id, name, description=None, remediation_hint=None):
        """
        Check that returns the service check status as a health status in StackState
        args: `stream_id, name, description, remediation_hint`
        `stream_id` the identifier of the stream this check should run on
        `name` the name this check will have in StackState
        `description` the description for this check in StackState
        `remediation_hint` the remediation hint to display when this check return a CRITICAL health state
        """

Logging

def _collect_type(self, key, mapping, the_type):
    self.log.debug("Collecting data with %s" % key)
    if key not in mapping:
        self.log.debug("%s returned None" % key)
        return None
    self.log.debug("Collecting done, value %s" % mapping[key])
    return the_type(mapping[key])

Error handling

A check should raise a significant exception when it can't work correctly, for example due to a wrong configuration or runtime error. Exceptions are logged and shown in the Agent status page. The warning method can be used to log a warning message and display it on the Agent status page.

self.warning("This will be visible in the status page")
if len(queries) > max_custom_queries:
    self.warning("Max number (%s) of custom queries reached. Skipping the rest."
                 % max_custom_queries)

See also

A more comprehensive example can be found in the

A more comprehensive example can be found in the

A more comprehensive example can be found in the

See the .

self.log - a

self.component - Create a component in StackState. See .

self.relation - Create a relation between two components in StackState. See .

See the example of creating a component in StackState in the .

See the example of creating a relation in StackState in the .

Check the example to send metrics in the .

The event-dict is a valid .

Check the usage in the following .

For more information on urns, health synchronization streams, snapshots and how to debug, see .

For an example of how to create a component, see the .

- a metric stream and associated metric health checks.

- a log stream with events and associated event health checks.

- a log stream with service check statuses for a specific integration and associated event health checks.

For details see the .

For details see the .

aggregation - Optional. sets the aggregation function for the metrics in StackState. See .

For details see the .

For details see the .

The self.log field is a instance that prints to the main Agent log file. The log level can be set in the Agent configuration file stackstate.yaml.

Example taken from the .

Example taken from the .

📖
StackState Agent
https://github.com/StackVista/stackstate-agent-integrations
AgentChecksV2
CheckResponse
StatefulAgentCheck
TransactionalAgentCheck
StackState Agent Integrations repo
StackState Agent Integrations repo
StackState Agent Integrations repo
example Agent check configuration file (github.com)
Python logger (python.org)
StackState MySQL check (github.com)
StackState SAP check (github.com)
StackState MySQL check (github.com)
example
health Synchronization
StackState Static Health check (github.com)
EventHealthChecks class (github.com)
EventHealthChecks class (github.com)
MetricHealthChecks class (github.com)
MetricHealthChecks class (github.com)
Python logger (python.org)
StackState MySQL Agent check (github.com)
StackState MySQL Agent check (github.com)
Connect an Agent check with StackState using the Custom Synchronization StackPack
Agent check state
How to develop Agent checks
Developer guide - Custom Synchronization StackPack
send components
send relations
Metric stream
Events stream
Service check stream
event JSON dictionary
aggregation methods