Mirroring

Mirroring is the way to connect StackState to custom telemetry sources such as Prometheus or Zabbix. The Mirroring is performed using two components Mirror Plugin, and remote metric system Mirror. The Mirror Plugin is a StackState plugin configurable to talk to remote Mirror. The plugin enforces the Mirror to implement specific REST API interface. In its turn, the Mirror acts as a gateway to target monitoring system and is implemented as a microservice.

Mirror is supposed not to hold any state and should only proxy StackState requests to target telemetry source system. The mirror can be implemented in any technology or programming language. The only requirement is that it should implemented to Mirror API below.

Mirror REST API

Mirror API consists of 4 methods:

  • Test Connection
  • Get Field Names
  • Get Field Values
  • Get Metric Values

All the requests are POST requests and the information is passed in the JSONs bodies of the requests and responses.

Common Request Information

Each request JSON contain a number of fields. The following fields are common: - Connection Details - Query Equality Conditions - Time Range - Result Set Limit

Connection Details

The field connectionDetails is mandatory and must be present in each request. The field contains arbitrary JSON configuration for connecting to the target system. This is a flexible configuration, and it is up to the mirror implementor to decide what configuration elements to put there. For example, a target telemetry a source URL, timeouts, API key, and many others.

Example:

  ...
  "connectionDetails" : {
        "port": 9900,
        "host": "prometheus.local.com",
        "requestTimeout": 15000,
        "nan_interpretation": "ZERO"
  }
  ...

Query Equality Conditions

Query conditions is a list of equality conditions being interpreted as a conjunction of equality statements. Please see the example below:

    ...
   "conditions": [
        {"key":"stringField", "value": { "value": "foobar", "_type": "StringValue" }, "_type": "EqualityCondition" },
        {"key":"doubleField", "value": { "value": 42.0, "_type": "DoubleValue" }, "_type": "EqualityCondition" },
        {"key":"booleanField", "value": { "value": true, "_type": "BooleanValue" }, "_type": "EqualityCondition" }
    ],
    ...

The equality condition consists of the key and value. The key is of type string, and it means the name of the variable/field/label in the remote monitoring system. The value is JSON object that contains the actual value of one of three types: string, double or boolean.

Time Range

The time range parameters are startTime, endTime holding timestamps in epoch milliseconds indicating the query date range.

    ...
    "startTime": 1504174208940,
    "endTime": 1504347008940,    
    ...

Result Set Limit

The field “limit” asks the mirror to return only subset of the results.

    ...
    "limit": 42,
    ...

Common Response Information

The Mirror is expected to reply with response with the header “x-mirror-api-key” and the StackState Mirror Plugin should compare it with configured version.

Test Connection

This api is a way to test connection to the remote metric system.

Path: api/connection

Test Connection Request:

{
  "_type": "TestConnectionRequest",
  "connectionDetails" : {
    "port": 9900,
    "host": "prometheus.local.com",
    "requestTimeout": 15000,
    "nan_interpretation": "ZERO"
  }
}

Test Connection Response:

If the Mirror plugin can successfully connect to remote metric system, it should reply with the success response:

Example of Success response:

{"status": "OK", "_type": "TestConnectionResponse"}

In case there was an error while connecting to remote metric store, the mirror should return the failure response:

{
  "status": "FAILURE",
  "error": {"_type": "MetricStoreConnectionError", "details": "Prometheus is not healthy."},
  "_type": "TestConnectionResponse"
}

The error field may contain the detailed error.

Field Names

  • The api retrieves field names from the remote monitoring system. Path: api/field/name

Request:

{
    "connectionDetails": { "host": "localhost", "port": 9000, "requestTimeout": 15000 },
    "query": {
        "conditions": [
            {"key":"stringField", "value": { "value": "foobar", "_type": "StringValue" }, "_type": "EqualityCondition" },
            {"key":"doubleField", "value": { "value": 42.0, "_type": "DoubleValue" }, "_type": "EqualityCondition" },
            {"key":"booleanField", "value": { "value": true, "_type": "BooleanValue" }, "_type": "EqualityCondition" }
        ],
        "startTime": 1504174208940,
        "endTime": 1504347008940,
        "limit": 2147483647,
        "latestFirst": true,
        "_type": "FieldNamesQuery"
    },
    "_type": "FieldNamesRequest"
}

The FieldNamesRequest holds FieldNamesQuery. The field names query contains - conditions, startTime, endTime, and limit acting as a filter for the result field list. The goal of those parameters is to help the user to do continuous refinement of available field names during configuration of telemetry stream.

Successful response should contain the list of field descriptors for the fields in the format below:

{
  "fields": [
    {"_type": "FieldDescriptor", "classified": false, "fieldName": "field1", "fieldType": "STRING"},
    {"_type": "FieldDescriptor", "classified": false, "fieldName": "field2", "fieldType": "BOOLEAN"},
    {"_type": "FieldDescriptor", "classified": false, "fieldName": "field3", "fieldType": "DOUBLE"}
  ],
  "isPartial": false,
  "_type":"FieldNamesResponse"
}

The interpretation of the field descriptor is given below:

  • fieldName - the name of the field.
  • fieldType - one of three STRING, NUMBER, BOOLEAN
  • classified - indicates that special care need to be taken when logging or displaying the values of this field.

Field Values

  • The api retrieves field values from the remote monitoring system.
Path: api/field/value

The FieldValuesRequest specifies the query for fetching field values given the field descriptor. The field descriptor is a mandatory field. All other parameters play role of a value query refinement filter. The latestFirst parameter indicates that field values should be returned sorted where the field values latest by timestamp should be on top. The fieldValuePrefix specifies a prefix filter for the result set of field values.

{
    "connectionDetails": {
        "host": "localhost",
        "port": 9000,
        "requestTimeout": 15000
    },
    "query": {
        "conditions": [
            {"key": "stringField", "value": { "value": "foobar", "_type": "StringValue" }, "_type": "EqualityCondition" }
        ],
        "field": { "fieldName": "string", "fieldType": "STRING", "classified": false, "_type": "FieldDescriptor"},
        "fieldValuePrefix": "cpu.",
        "startTime": 1504174208940,
        "endTime": 1505124608940,
        "limit": 2147483647,
        "offset": 0,
        "latestFirst": true,
        "_type": "FieldValuesQuery"
    },
    "_type":"FieldValuesRequest"
}

The FieldValuesResponse holds the list of possible values:

{
    "values": [
        {"value": "value1", "_type": "CompleteValue"},
        {"value": "cpu.*", "_type": "FieldValuePattern"}
    ],
    "isPartial": false,
    "_type":"FieldValuesResponse"
}

There are two types of values CompleteValue, and FieldValuePattern. The CompleteValue indicates that the value field contains a full value token. The FieldValuePattern specifies the partial value that can be used as a fieldValuePrefix in subsequent refinement requests.

Get Metric Values

Path: api/metric

Get metrics API allows fetching metric values. There are two types of metric queries: raw and aggregated.

The example of raw metric request:

{
    "connectionDetails": { "host": "localhost", "port": 9000, "requestTimeout": 15000 },
    "query": {
        "conditions": [
            {"key": "string", "value": {"value": "stringValue2", "_type": "StringValue"}, "_type": "EqualityCondition"},
            {"key": "double", "value": {"value": 2.0, "_type": "DoubleValue"}, "_type": "EqualityCondition"},
            {"key": "boolean", "value": {"value": true, "_type": "BooleanValue"}, "_type": "EqualityCondition"}
        ],
        "startTime": 1504400400000,
        "endTime": 1504411200000,
        "metricField": "rawValue",
        "limit": 100500,
        "_type": "MetricsQuery"
    },
    "_type": "MetricsRequest"
}

A raw metric request is executed when one wants to fetch raw (non aggregated) values from the remote metric store. The mirror can recognize this query by the absence of optional aggregation object in the query field. Besides common fields, there is metricField which optionally indicates the source field for metric values.

The example of the response of the raw query is given below:

{
    "telemetry": {
        "points": [
            [1.0, 1555408501000],
            [2.0, 1555408601000],
            [3.0, 1555408701000]
        ],
        "dataFormat": ["value", "timestamp"],
        "isPartial": false,
        "_type": "RawMetricTelemetry"
    },
    "_type": "MetricsResponse"
}

The fields have the following meaning:

  • the points list contains several sublists each representing one data point
  • the field name positions of values in data point sublist is specified by dataFormat field
  • the field isPartial indicates if the response has been truncated either by application of limit field or by metric store itself. The user normally should take an action and execute another metric request specifying last point timestamp as request.query.startTime to retrieve truncated values.

The example of aggregated metric query is given below:

{
    "connectionDetails": { "host": "localhost", "port": 9000, "requestTimeout": 15000 },
    "query":{
        "conditions":[
            {"key":"string", "value": {"value": "stringValue2", "_type":"StringValue"}, "_type": "EqualityCondition"},
            {"key":"double", "value": {"value": 2.0, "_type":"DoubleValue"}, "_type": "EqualityCondition"},
            {"key":"boolean", "value": {"value": true, "_type":"BooleanValue"}, "_type": "EqualityCondition"}
        ],
        "aggregation": {
            "method": "MAX",
            "bucketSizeMillis": 3600000,
            "_type": "Aggregation"
        },
        "startTime": 1504400400000,
        "endTime": 1504411200000,
        "metricField": "double",
        "limit": 100500,
        "_type": "MetricsQuery"
    },
    "_type": "MetricsRequest"
 }

The request is the same as for raw query with one exception, the aggregation field is not empty and holds aggregation method and aggregation bucket size - bucketSizeMillis. The aggregation is done using batching windowing method.

The following aggregation methods are supported by the mirror plugin:

  • MEAN - mean
  • PERCENTILE_25 - 25 percentile
  • PERCENTILE_50 - 50 percentile
  • PERCENTILE_75 - 75 percentile
  • PERCENTILE_90 - 90 percentile
  • PERCENTILE_95 - 95 percentile
  • PERCENTILE_98 - 98 percentile
  • PERCENTILE_99 - 99 percentile
  • MAX - maximum
  • MIN - minimum
  • SUM - sum
  • EVENT_COUNT - the number of occurrences during bucket interval

The example of aggregated query response is given below:

{
    "telemetry": {
        "points": [
            [1.0, 1555408501000, 1555408511000],
            [2.0, 1555408601000, 1555408611000],
            [3.0, 1555408701000, 1555408711000]
        ],
        "dataFormat": ["value", "startTimestamp", "endTimestamp"],
        "isPartial": false,
        "_type": "AggregatedMetricTelemetry"
    },
    "_type": "MetricsResponse"
}

The fields have the same meaning as for raw metric query. The only difference is that points sublists contain startTimestamp and endTimestamp fields indicating aggregated bucket start and stop time. The positions are specified in dataFormat format field.

Error responses

In case of mirror request failures the mirror may reply with the following errors below:

  • Remote metric system connection issues

    {
    "_type": "MetricStoreConnectionError",
    "details": "Unable to connect to remote metrics store."
    }
    
  • Metric not found error

    {
    "_type": "MetricNotFoundError",
    "metric": "service_request_seconds"
    "details": "Metric is not configured."
    }
    
  • Mirror or remote metric system does not support metric type

    {
    "_type": "UnsupportedFieldTypeError",
    "mirrorType": "BOOLEAN"
    }
    
  • If the failure does not fall into any previous category then mirror can return generic RemoteMirrorError.

    {
    "_type": "RemoteMirrorError",
    "summary": "Arbitrary remote error summary"
    "details": "Arbitrary remote error details"
    }