Open telemetry collector
StackState v6.0
Last updated
StackState v6.0
Last updated
The OpenTelemetry Collector offers a vendor-agnostic implementation to receive, process and export telemetry data. Applications instrumented with Open Telemetry SDKs can use the collector to send telemetry data to StackState (traces and metrics).
Your applications, when set up with OpenTelemetry SDKs, can use the collector to send telemetry data, like traces and metrics, straight to StackState. The collector is set up to receive this data by default via OTLP, the native open telemetry protocol. It can also receive data in other formats provided by other instrumentation SDKs like Jaeger and Zipkin for traces, and Influx and Prometheus for metrics.
Usually, the collector is running close to your application, like in the same Kubernetes cluster, making the process efficient.
For StackState integration, it's simple: StackState offers an OTLP endpoint using the gRPC protocol and uses bearer tokens for authentication. This means configuring your OpenTelemetry collector to send data to StackState is easy and standardized.
A Kubernetes cluster with an application that is
An API key for StackState
Permissions to deploy the open telemetry collector in a namespace on the cluster (i.e. create resources like deployments and configmaps in a namespace). To be able to enrich the data with Kubernetes attributes permission is needed to create a and role binding.
To install and configure the collector for usage with StackState we'll use the and add the configuration needed for StackState:
helm chart configuration
generating metrics from traces
sending the data to StackState
combine it all together in pipelines
Here is the full values file needed, continue reading below the file for an explanation of the different parts. Or skip ahead to the next step, but make sure to replace:
<otlp-stackstate-endpoint>
with the OTLP endpoint of your StackState. If, for example, you access StackState on play.stackstate.com
the OTLP endpoint is otlp-play.stackstate.com
. So simply prefixing otlp-
to the normal StackState url will do.
<your-cluster-name>
with the cluster name you configured in StackState. This must be the same cluster name used when installing the StackState agent. Using a differnt cluster name will result in an empty traces perspective for Kubernetes components.
The Kubernetes attributes and the span metrics namespace are required for StackState to provide full functionality.
The config
section customizes the collector config itself and is discussed in the next section. The other parts are:
ports
: Used to enable the metrics port such that the collector can scrape its own metrics
presets
: Used to enable the default configuration for adding Kubernetes metadata as attributes, this includes Kubernetes labels and metadata like namespace, pod, deployment etc. Enabling the metadata also introduces the cluster role and role binding mentioned in the pre-requisites.
The service
section determines what components of the collector are enabled. The configuration for those components comes from the other sections (extensions, receivers, connectors, processors and exporters). The extensions
section enables:
health_check
, doesn't need additional configuration but adds an endpoint for Kubernetes liveness and readiness probes
bearertokenauth
, this extension adds an authentication header to each request with the StackState API key. In its configuration, we can see it is getting the StackState API key from the environment variable API_KEY
.
The pipelines
section defines pipelines for the traces and metrics. The metrics pipeline defines:
receivers
, to receive metrics from instrumented applications (via the OTLP protocol, otlp
), from spans (the spanmetrics
connector) and by scraping Prometheus endpoints (the prometheus
receiver). The latter is configured by default in the collector Helm chart to scrape the collectors own metrics
processors
: The memory_limiter
helps to prevent out-of-memory errors. The batch
processor helps better compress the data and reduce the number of outgoing connections required to transmit the data. The resource
processor adds additional resource attributes (discussed separately)
exporters
: The debug
exporter simply logs to stdout which helps when troubleshooting. The otlp/stackstate
exporter sends telemetry data to StackState using the OTLP protocol. It is configured to use the bearertokenauth extension for authentication to send data to the StackState OTLP endpoint.
For traces, there are 3 pipelines that are connected:
traces
: The pipeline that receives traces from SDKs (via the otlp
receiver) and does the initial processing using the same processors as for metrics. It exports into a router which routes all spans to both other traces pipelines. This setup makes it possible to calculate span metrics for all spans while applying sampling to the traces that are exported.
traces/spanmetrics
: Use the spanmetrics
connector as an exporter to generate metrics from the spans (otel_span_duration
and otel_span_calls
). It is configured to not report time series anymore when no spans have been observed for 5 minutes. StackState expects the span metrics to be prefixed with otel_span_
, which is taken care of by the namespace
configuration.
The resource
processor is configured for both metrics and traces. It adds extra resource attributes:
The k8s.cluster.name
is added by providing the cluster name in the configuration. StackState needs the cluster name and Open Telemetry does not have a consistent way of determining it. Because some SDKs, in some environments, provide a cluster name that does not match what StackState expects the cluster name is an upsert
(overwrites any pre-existing value).
The service.instance.id
is added based on the pod uid. It is recommended to always provide a service instance id, and the pod uid is an easy way to get a unique identifier if the SDKs don't provide one.
It is highly recommended to use sampling for traces:
To manage resource usage by only processing and storing the most relevant traces
To manage costs and have predictable costs
To reduce noise and focus on the important traces only, for example by filtering out health checks
Have predictable cost by having a predictable trace volume
Have a large sample of all errors
Have a large sample of all slow traces
Have a sample of all other traces to see the normal application behavior
Criteria 2 and 3 can only be fulfilled by tail sampling. Let's look at the sampling policies used in the configuration of the tail sampler now:
There is only one top-level policy, it is a composite
policy. It uses a rate limit, allowing at most 500 traces per second, giving a predictable trace volume. It uses other policies as sub-policies to make the actual sampling decissions.
The errors
policy is of type status_code
and is configured to only sample traces that contain errors. 33% of the rate limit is reserved for errors, via the rate_allocation
section of the composite policy.
The slow-traces
policy is of type latency
and filters all traces slower than 1 second. 33% of the rate limits is reserved for the slow traces.
The rest
policy is of the always_sample
type. It will sample all traces until it hits the rate limit enforced by the composite policy, which is 34% of the total rate limit of 500 traces.
There are many more policies available that can be added to the configuration when needed. For example, it is possible to filter traces based on certain attributes (only for a specific application or customer). The tail sampler can also be replaced with the probabilistic sampler. For all configuration options please use the documentation of these processors:
The collector needs a Kubernetes secret with the StackState API key. Create that in the same namespace (here we are using the open-telemetry
namespace) where the collector will be installed (replace <stackstate-api-key>
with your API key):
StackState supports two types of keys:
Receiver API Key
Ingestion API Key
You can find the API key for StackState on the Kubernetes Stackpack installation screen:
Open StackState
Navigate to StackPacks and select the Kubernetes StackPack
Open one of the installed instances
Scroll down to the first set of installation instructions. It shows the API key as STACKSTATE_RECEIVER_API_KEY
in text and as 'stackstate.apiKey'
in the command.
To deploy the collector first make sure you have the Open Telemetry helm charts repository configured:
Now install the collector, using the configuration defined in the previous steps:
The collector as it is configured now is ready to receive and send telemetry data. The only thing left to do is to update the SDK configuration for your applications to send their telemetry via the collector to the agent.
The Open Telemetry documentation provides much more details on the configuration and alternative installation options:
Open Telemetry Collector configuration: https://opentelemetry.io/docs/collector/configuration/
Kubernetes installation of the collector: https://opentelemetry.io/docs/kubernetes/helm/collector/
Using the Kubernetes operator instead of the collector Helm chart: https://opentelemetry.io/docs/kubernetes/operator/
Open Telemetry sampling: https://opentelemetry.io/blog/2022/tail-sampling/
The suggested configuration includes tail sampling for traces. Sampling can be fully customized and, depending on your applications and the volume of traces, it may be needed to . For example an increase (or decrease) in max_total_spans_per_second
. It is highly recommended to keep sampling enabled to keep resource usage and cost under control.
extraEnvsFrom
: Sets environment variables from the specified secret, in the next step this secret is created for storing the StackState API key (Receiver / )
mode
: Run the collector as a Kubernetes deployment, when to use the other modes is discussed .
traces/sampling
: The pipeline that exports traces to StackState using the OTLP protocol, but uses the tail sampling processor to make the trace volume that is sent to StackState predictable to keep the cost predictable as well. Sampling is discussed in a .
There are 2 approaches for sampling, head sampling and tail sampling. This discusses the pros and cons of both approaches in detail. The collector configuration provided here uses tail sampling to support these requirements:
StackState supports creating multiple Ingestion Keys. This allows you to assign a unique key to each OpenTelemetry Collector for better security and access control. For instructions on generating an Ingestion API Key, refer to the .
Use the to export data to the collector. Follow the to enable the SDK for your applications.