StackState Kubernetes Troubleshooting
StackState can observe connections between services and pods in different Clusters, or when the connections go through a Service Mesh or Load Balancer. Observing these connections is done through
request tracing. Traced requests will result in connections in the topology perspective, to give insight in the dependencies across an application and help with finding the root cause of an incident.
Request tracing is done by injecting a unique header (the
X-Request-IDheader) into all HTTP traffic. This unique header is observed at both client and server through an eBPF probe installed with the StackState Agent. These observations are sent to StackState, which uses the observations to understand which clients and server are connected.
X-Request-Idheaders are injected by a sidecar proxy that can be automatically injected by the StackState Agent. The sidecar gets injected by a mutating webhook, which injects the sidecar into every pod for which the
http-header-injector.stackstate.io/inject: enabledannotation is defined. Sidecar injection is not supported on OpenShift.
Enabling trace header injection is a two-step process:
- 1.Install the mutating webhook into the cluster by adding
--set httpHeaderInjectorWebhook.enabled=trueto the helm upgrade invocation when installing the StackState agent. By default the sidecar injector generates its own self-signed certificate, requiring cluster roles to install these into the cluster. It is also possible to manage your own certificates in a more restricted environment.
- 2.For every pod that has a endpoint which processes http(s) requests, place the annotation
http-header-injector.stackstate.io/inject: enabledto have the sidecar injected.
Enabling the mutating webhook will only take effect upon pod restart
If the annotation is placed before the webhook is installed. Installing the webhook has no effect until the pods get restarted.
Disabling the trace header injection can be done with the reverse process:
- 1.Remove the
http-header-injector.stackstate.io/inject: enabledannotation from all pods.
- 2.Redeploy the StackState Agent without the
Disabling the mutating webhook will only take effect upon pod restart
If step 1 is skipped and only the mutating webhook is disabled, all pods need a restart for the sidecar to be removed.
Request tracing adds a small, fixed amount of CPU overhead for each HTTP request header that gets injected and observed. The exact amount is dependent on the system that it's ran on, so it's advised to enable this feature first in an acceptance environment to observe the impact before moving to production. The sidecar proxy takes a minimum of 25Mb of memory per pod it's deployed with, up to a maximum of 40Mb.
To add the
X-Request-Idheader from an existing proxy, two properties are important:
- 1.Each request/response pair has to get a unique ID.
X-Request-Idheader should be added to both request and response, to be observed on both client and server.
kubectlto apply the following definition to the Kubernetes cluster,
- applyTo: NETWORK_FILTER
It's also possible to add the
X-Request-Idheader form either the client side to each request, or on the server side to each response. It's important to ensure each request/response gets a unique
X-Request-Idvalue. Also, the
X-Request-Idrequires that if an ID is already present in a request, the response should contain that same ID.
- HTTP/1.0 and HTTP/1.1 with keepAlive
- Trace header injection and trace observation on unencrypted traffic
- Trace observation for OpenSSL Encrypted traffic
- Trace header injection alongside LinkerD
- Any LoadBalancer that forwards the
X-Request-Idheader in requests and responses
- Any cross-cluster networking solution that forwards the
X-Request-Idheader in requests and responses
To make sure you setup is ok, first validate the following steps were taken:
--set httpHeaderInjectorWebhook.enabled=trueflag was set during installation of the agent
- The pod has
- The pod was restarted
If this does not resolve the issue, the following could be the issue:
The cluster can have networking policies setup, preventing the kubernetes control-plane apiserver from contacting the mutatingvalidationwebhook which injects the sidecar. To validate this, look at the logs of the kube-apiserver, which is either in the kube-system namespace or could be managed by your cloud provider. An error like the following should be found in those logs:
Failed calling webhook, failing open stackstate-agent-http-header-injector-webhook.stackstate.io: failed calling webhook "stackstate-agent-http-header-injector-webhook.stackstate.io": failed to call webhook: Post "https://stackstate-agent-http-header-injector.monitoring.svc:8443/mutate?timeout=10s": context deadline exceeded
If this happens, be sure to adapt your cluster network policies such that the apiserver can reach the mutatingvalidationwebhook.