Stackstate-etcd Integration

Overview

Capture etcd metrics in Stackstate to:

  • Monitor the health of your etcd cluster.
  • Know when host configurations may be out of sync.
  • Correlate the performance of etcd with the rest of your applications.

Setup

Installation

The etcd check is packaged with the Agent, so simply install the Agent.

Configuration

  1. Configure the Agent to connect to etcd, edit conf.d/etcd.yaml

    init_config:
    
    instances:
    # url, the API endpoint of your etcd instance
    #    - url: "https://server:port"
    # timeout, time to wait on a etcd API request
    #      timeout: 5
    

  2. Restart the Agent

Validation

Execute the info command and verify that the integration check has passed. The output of the command should contain a section similar to the following:

Checks
======

  [...]

  etcd
  ----
      - instance #0 [OK]
      - Collected 8 metrics & 0 events

Data Collected

Metrics

etcd.store.gets.success
(gauge)
Rate of successful get requests
shown as request
etcd.store.gets.fail
(gauge)
Rate of failed get requests
shown as request
etcd.store.sets.success
(gauge)
Rate of successful set requests
shown as request
etcd.store.sets.fail
(gauge)
Rate of failed set requests
shown as request
etcd.store.delete.success
(gauge)
Rate of successful delete requests
shown as request
etcd.store.delete.fail
(gauge)
Rate of failed delete requests
shown as request
etcd.store.update.success
(gauge)
Rate of successful update requests
shown as request
etcd.store.update.fail
(gauge)
Rate of failed update requests
shown as request
etcd.store.create.success
(gauge)
Rate of successful create requests
shown as request
etcd.store.create.fail
(gauge)
Rate of failed create requests
shown as request
etcd.store.compareandswap.success
(gauge)
Rate of compare and swap requests success
shown as request
etcd.store.compareandswap.fail
(gauge)
Rate of compare and swap requests failure
shown as request
etcd.store.compareanddelete.success
(gauge)
Rate of compare and delete requests success
shown as request
etcd.store.compareanddelete.fail
(gauge)
Rate of compare and delete requests failure
shown as request
etcd.store.expire.count
(gauge)
Rate of expired keys
shown as eviction
etcd.store.watchers
(gauge)
Rate of watchers
shown as
etcd.self.send.pkgrate
(gauge)
Rate of packets received
shown as packet
etcd.self.send.bandwidthrate
(gauge)
Rate of bytes received
shown as byte
etcd.self.recv.pkgrate
(gauge)
Rate of packets sent
shown as packet
etcd.self.recv.bandwidthrate
(gauge)
Rate of bytes sent
shown as byte
etcd.self.recv.appendrequest.count
(gauge)
Rate of append requests this node has processed
shown as request
etcd.self.send.appendrequest.count
(gauge)
Rate of append requests this node has sent
shown as request
etcd.leader.counts.fail
(gauge)
Rate of failed Raft RPC requests
shown as request
etcd.leader.counts.success
(gauge)
Rate of successful Raft RPC requests
shown as request
etcd.leader.latency.current
(gauge)
Current latency to each peer in the cluster
shown as millisecond
etcd.leader.latency.avg
(gauge)
Average latency to each peer in the cluster
shown as millisecond
etcd.leader.latency.min
(gauge)
Minimum latency to each peer in the cluster
shown as millisecond
etcd.leader.latency.max
(gauge)
Maximum latency to each peer in the cluster
shown as millisecond
etcd.leader.latency.stddev
(gauge)
Standard deviation latency to each peer in the cluster
shown as millisecond

Furthermore, etcd metrics are tagged with etcd_state:leader or etcd_state:follower, depending on the node status, so you can easily aggregate metrics by status. Returns ‘Critical’ if the Agent cannot collect metrics from your etcd API endpoint.