Sending Metrics with StsStatsD


This guide explains how to send your application’s custom metrics to StackState. Sending your application’s custom metrics to StackState will let you correlate what’s happening with your application, your users and your system.

Metrics are collected by sending them to StatsD, a small metrics aggregation server that is bundled with the StackState Agent. You can read about how it works here. If you want to dive into code right away, read on.

In this tutorial, we’ll cover some common instrumentation use cases, like:

  • How to count web page views
  • How to time database queries
  • How to measure the amount of free memory


First off, install the StackState Agent, which contains our StatsD server, and make sure it’s running.

Next, let’s set up a client library for your language.

First, install the module:
$ pip install stackstate
And import it, so it's ready to use:
from stackstate import statsd

First, install the module:
$ gem install stsstatsd-ruby
And add it to your code:
# Import the library
require 'stackstate/statsd'

# Create a statsd client instance.
statsd =

Now we’re ready to roll.

Metric names

There are a few rules to stick to when naming metrics:

  • Metric names must start with a letter
  • Can only contain ascii alphanumerics, underscore and periods (other characters will get converted to underscores)
  • Should not exceed 200 characters (though less than 100 is generally preferred from a UI perspective)
  • Unicode is not supported
  • We recommend avoiding spaces
Metrics reported by the Agent are in a pseudo-hierarchical dotted format (e.g. http.nginx.response_time). We say pseudo-hierarchical because we’re not actually enforcing a hierarchy or doing anything with it, but we have aspirations to use it to infer things about servers (e.g. “hey, I see hostA and hostB are reporting ‘http.nginx.*‘, those must be web frontends”).


Counters are used to (ahem) count things. Let’s walk through a common example - counting web page views. To achieve this, we’ll increment a metric called web.page_views each time our render_page function is called.

def render_page():
    """ Render a web page. """
    return 'Hello World!'
def render_page()
  # Render a web page.
  return 'Hello World!'

That’s it. With this one line of code we can start graphing the data. Here’s an example:

Note that StatsD counters are normalized over the flush interval to report per-second units. In the graph above, the marker is reporting 35.33 web page views per second at ~15:24. In contrast, if one person visited the webpage each second, the graph would be a flat line at y = 1. To increment or measure values over time, please see gauges.

We can also count by arbitrary numbers. Suppose we wanted to count the number of bytes processed by a file uploading service. We’ll increment a metric called file_service.bytes_uploaded by the size of the file each time our upload_file function is called:

def upload_file(file):
    statsd.increment('file_service.bytes_uploaded', file.size())
    return 'File uploaded!'
def upload_file(file)
  statsd.count('file_service.bytes_uploaded', file.size())
  return 'File uploaded!'

Note that for counters coming from another source that are ever-increasing and never reset – for example, the number of queries from MySQL over time – we track the rate between flushed values. While there currently isn’t an elegant solution to get raw counts within StackState, you may want to apply a function to your series like cumulative sum or integral.


Gauges measure the value of a particular thing over time. Suppose a developer wanted to track the amount of free memory on a machine, we can periodically sample that value as the metric

# Record the amount of free memory every ten seconds.
while True:
    statsd.gauge('', get_free_memory())
# Record the amount of free memory every ten seconds.
while true do
    statsd.gauge('', get_free_memory())


Histograms measure the statistical distribution of a set of values. Suppose we wanted to measure the duration of a database query, we can sample each query time with the metric database.query.time.

# Track the run time of the database query.
start_time = time.time()
results = db.query()
duration = time.time() - start_time
statsd.histogram('database.query.time', duration)

# We can also use the `timed` decorator as a short-hand for timing functions.
def get_data():
    return db.query()
start_time =
results = db.query()
duration = - start_time
statsd.histogram('database.query.time', duration)

# We can also use the `time` helper as a short-hand for timing blocks
# of code.
statsd.time('database.query.time') do
  return db.query()

The above instrumentation will produce the following metrics:

  • database.query.time.count - the number of times this metric was sampled
  • database.query.time.avg - the average time of the sampled values
  • database.query.time.median - the median sampled value
  • database.query.time.max - the maximum sampled value
  • database.query.time.95percentile - the 95th percentile sampled value

These metrics give insight into how different each query time is. We can see how long the query usually takes by graphing the median. We can see how long most queries take by graphing the 95percentile.

For this toy example, let’s say a query time of 1 second is acceptable. Our median query time (graphed in purple) is usually less than 100 milliseconds, which is great. But unfortunately, our 95th percentile (graphed in blue) has large spikes sometimes nearing three seconds, which is unacceptable. This means most of our queries are running just fine, but our worst ones are very bad. If the 95th percentile was close to the median, than we would know that almost all of our queries are performing just fine.

Histograms aren't just for measuring times. They can be used to measure the distribution of any type of value, like the size of uploaded files or classroom test scores.

Service Checks

Service checks are used to send information about the status of a service.

from stackstate.api.constants import CheckStatus

# Report the status of an app.
name = 'web.app1'
status = CheckStatus.OK
message = 'Response: 200 OK'

statsd.service_check(check_name=name, status=status, message=message)
# Report the status of an app.
name = 'web.app1'
status = StackState::Statsd::OK
opts = {
  'message' => 'Response: 200 OK'

statsd.service_check(name, status, opts)

After a service check has been reported, you can use it to trigger a Custom Check monitor.


Sets are used to count the number of unique elements in a group. If you want to track the number of unique visitors to your site, sets are a great way to do that.

def login(self, user_id):
    # Log the user in ...
    statsd.set('users.uniques', user_id)
def login(self, user_id)
    # Log the user in ...
    statsd.set('users.uniques', user_id)


Tags are a way of adding dimensions to metrics, so they can be sliced, diced, aggregated and compared on the front end. Suppose we wanted to measure the performance of two algorithms in the real world. We could sample one metric algorithm.run_time and specify each version with a tag:

@statsd.timed('algorithm.run_time', tags=['algorithm:one'])
def algorithm_one():
    # Do fancy things here ...

@statsd.timed('algorithm.run_time', tags=['algorithm:two'])
def algorithm_two():
    # Do fancy things here ...
def algorithm_one()
  statsd.timed('algorithm.run_time', :tags => ['algorithm:one']) do
    # Do fancy things here ...

def algorithm_two()
  statsd.timed('algorithm.run_time', :tags => ['algorithm:two']) do
    # Do different fancy things here ...

On the front end, the metric instances can be aggregated as sums or averages, or the minimum/maximum can be reported. The metrics can also be broken down by each tag ‘key’ (in the key:value syntax). For instance, we could run a query like this:

avg:algorithm.run_time{*} by {algorithm}

In this query, all instances of this metric (e.g. across all hosts, indicated by *) are averaged (avg) and broken down by the tag key algorithm.

We store one time series per host + metric + tag combination on our backend, thus we cannot support infinitely bounded tags. Please don't include endlessly growing tags in your metrics, like timestamps or user ids. Please limit each metric to 1000 tags.

Tags must start with a letter, and after that may contain alphanumerics, underscores, minuses, colons, periods and slashes. Other characters will get converted to underscores. Tags can be up to 200 characters long and support unicode. Tags will be converted to lowercase as well.

For optimal functionality, we recommend constructing tags that use the key:value syntax. Examples of commonly used metric tag keys are env, instance, name, and role. Note that device, host, and source are treated specially and cannot be specified in the standard way. Check out some of our other docs for how to use these:

Sample Rates

Each metric point is sent over UDP to the StatsD server. This can incur a lot of overhead for performance intensive code paths. To work around this, StatsD supports sample rates, which allows sending a metric a fraction of the time and scaling up correctly on the server.

The following code will only send points half of the time:

while True:
  statsd.increment('loop.count', sample_rate=0.5)
while true do
  statsd.increment('loop.count', :sample_rate => 0.5)

Other Submission Methods

Using the StatsD server bundled with the StackState Agent is the simplest way of submitting metrics to StackState, but it’s not the only one. Here are some other ways of getting your metrics data into StackState:

  • Submit metrics directly to StackState's HTTP API