LogoLogo
StackState.comDownloadSupportExplore playground
SUSE Observability
SUSE Observability
  • SUSE Observability docs!
  • Docs for all SUSE Observability products
  • 🚀Get started
    • Quick start guide
    • SUSE Observability walk-through
    • SUSE Rancher Prime
      • Air-gapped
      • Agent Air-gapped
    • SUSE Cloud Observability
  • 🦮Guided troubleshooting
    • What is guided troubleshooting?
    • YAML Configuration
    • Changes
    • Logs
  • 🚨Monitors and alerts
    • Monitors
    • Out of the box monitors for Kubernetes
    • Notifications
      • Configure notifications
      • Notification channels
        • Slack
        • Teams
        • Webhook
        • Opsgenie
      • Troubleshooting
    • Customize
      • Add a monitor using the CLI
      • Derived State monitor
      • Override monitor arguments
      • Write a remediation guide
  • 📈Metrics
    • Explore Metrics
    • Custom charts
      • Adding custom charts to components
      • Writing PromQL queries for representative charts
      • Troubleshooting custom charts
    • Advanced Metrics
      • Grafana Datasource
      • Prometheus remote_write
      • OpenMetrics
  • 📑Logs
    • Explore Logs
    • Log Shipping
  • 🔭Traces
    • Explore Traces
  • 📖Health
    • Health synchronization
    • Send health data over HTTP
      • Send health data
      • Repeat Snapshots JSON
      • Transactional Increments JSON
    • Debug health synchronization
  • 🔍Views
    • Kubernetes views
    • Custom views
    • Component views
    • Explore views
    • View structure
      • Overview perspective
      • Highlights perspective
      • Topology perspective
      • Events perspective
      • Metrics perspective
      • Traces perspective
      • Filters
      • Keyboard shortcuts
    • Timeline and time travel
  • 🕵️Agent
    • Network configuration
      • Proxy Configuration
    • Using a custom registry
    • Custom Secret Management
      • Custom Secret Management (Deprecated)
    • Request tracing
      • Certificates for sidecar injection
  • 🔭Open Telemetry
    • Overview
    • Getting started
      • Concepts
      • Kubernetes
      • Kubernetes Operator
      • Linux
      • AWS Lambda
    • Open telemetry collector
      • Sampling
      • SUSE Observability OTLP APIs
    • Instrumentation
      • Java
      • Node.js
        • Auto-instrumentation of Lambdas
      • .NET
      • SDK Exporter configuration
    • Troubleshooting
  • CLI
    • SUSE Observability CLI
  • 🚀Self-hosted setup
    • Install SUSE Observability
      • Requirements
      • Kubernetes / OpenShift
        • Kubernetes install
        • OpenShift install
        • Alibaba Cloud ACK install
        • Required Permissions
        • Override default configuration
        • Configure storage
        • Exposing SUSE Observability outside of the cluster
      • Initial run guide
      • Troubleshooting
        • Advanced Troubleshooting
        • Support Package (Logs)
    • Configure SUSE Observability
      • Slack notifications
      • E-mail notifications
      • Stackpacks
      • Advanced
        • Analytics
    • Release Notes
      • v2.0.0 - 11/Sep/2024
      • v2.0.1 - 18/Sep/2024
      • v2.0.2 - 01/Oct/2024
      • v2.1.0 - 29/Oct/2024
      • v2.2.0 - 09/Dec/2024
      • v2.2.1 - 10/Dec/2024
      • v2.3.0 - 30/Jan/2025
      • v2.3.1 - 17/Mar/2025
      • v2.3.2 - 22/Apr/2025
      • v2.3.3 - 07/May/2025
    • Upgrade SUSE Observability
      • Migration from StackState
      • Steps to upgrade
      • Version-specific upgrade instructions
    • Uninstall SUSE Observability
    • Air-gapped
      • SUSE Observability air-gapped
      • SUSE Observability Kubernetes Agent air-gapped
    • Data management
      • Backup and Restore
        • Kubernetes backup
        • Configuration backup
      • Data retention
      • Clear stored data
    • Security
      • Authentication
        • Authentication options
        • Single password
        • File-based
        • LDAP
        • Open ID Connect (OIDC)
          • Microsoft Entra ID
        • KeyCloak
        • Service tokens
        • Troubleshooting
      • RBAC
        • Role-based Access Control
        • Permissions
        • Roles
        • Scopes
      • Self-signed certificates
      • External secrets
  • 🔐Security
    • Service Tokens
    • API Keys
  • ☁️SaaS
    • User Management
  • Reference
    • SUSE Observability Query Language (STQL)
    • Chart units
    • Topology Identifiers
Powered by GitBook
LogoLogo

Legal notices

  • Privacy
  • Cookies
  • Responsible disclosure
  • SOC 2/SOC 3
On this page
  • Use the same Kubernetes cluster name
  • The collector cannot send data to SUSE Observability
  • SUSE Observability's OTLP endpoint and API key are misconfigured
  • Some proxies and firewalls don't work well with gRPC
  • The instrumented application cannot send data to the collector
  • The URL is incorrect or traffic is blocked
  • The language SDK doesn't support gRPC
  • The language SDK uses the wrong port
  • Kubernetes pods with hostNetwork enabled
  • Node.js application on Google Kubernetes Engine
  • No metrics available for Node.js application
  • Kubernetes attributes cannot be added
  1. Open Telemetry

Troubleshooting

SUSE Observability

PreviousSDK Exporter configurationNextSUSE Observability CLI

Last updated 20 days ago

There are a lot of configuration options but more importantly, every (Kubernetes) environment is slightly different. To find out where the problem is the quickest approach is to pick a pod from which telemetry data is expected:

  1. Check the beginning of the logs for the pod, SDKs will log warnings or errors when instrumentation fails at startup

  2. Check the logs also for any errors related to sending data to the collector

  3. Check the logs of the collector pod(s) for configuration or initialization errors, these will be logged right after the startup of the pod

  4. Check the collector logs also for errors related to sending data to SUSE Observability

The error(s) in the logs usually give a good indication of the problem. We list the most common causes for Open Telemetry data not being available for some or all of your instrumented applications. If the problem is not listed here you can also look at the or the from Open Telemetry.

Use the same Kubernetes cluster name

Make sure you use the same Kubernetes cluster name for the same cluster when:

  • Installing the Open Telemetry Collector

  • Installing the SUSE Observability agent

  • Installing the Kubernetes StackPack

When different names are used for the same cluster SUSE Observability will not be able to match the data from Open Telemetry with the data from the SUSE Observability agent and the traces perspective will remain empty.

The collector cannot send data to SUSE Observability

SUSE Observability's OTLP endpoint and API key are misconfigured

If there are connection errors it is possible the OTLP endpoint is incorrect. If there are authentication/authorization errors (status codes 401 and 403) it is likely the API key is not valid (anymore). Check that the configured OTLP endpoint is the URL for your SUSE Observability, prefixed with otlp- and suffixed with :443. For example, the OTLP endpoint for play.stackstate.com is otlp-play.stackstate.com:443.

To ensure the api key is configured correctly check that:

  1. the secret contains a valid API key (verify this in SUSE Observability)

  2. the secret is used as environment variables on the pod

  3. the bearertokenauth extension is using the correct scheme and the value from the API_KEY environment variable

  4. the bearertokenauth extension is used by the otlp/suse-observability exporter

Some proxies and firewalls don't work well with gRPC

If the collector needs to send data to SUSE Observability through a proxy or a firewall it can be that they either block the traffic completely or possibly drop some parts of the gRPC messages or unexpectedly drop the long-lived gRPC connection completely. The easiest fix is to switch from gRPC to use HTTP instead, by replacing the otlp/suse-observability exporter configuration and all its references with the otlphttp/suse-observability exporter which is already configured and ready.

The instrumented application cannot send data to the collector

The URL is incorrect or traffic is blocked

The language SDK doesn't support gRPC

The language SDK uses the wrong port

Kubernetes pods with hostNetwork enabled

The Open Telemetry collector enriches the telemetry data with Kubernetes metadata. The way it is configured all telemetry data that cannot be enriched is dropped. However, the collector cannot enrich pods that are running with hostNetwork: true set automatically. This is not possible because pod identification happens using the IP address of the pod and pods that use the host network use the IP address of the host.

To help the collector to identify a pod we can add the k8s.pod.uid attribute to the metadata by instructing the SDK to add it directly. To do this modify your pod spec and add the following environment variables to your instrumented application container:

env:
  - name: POD_UID
    valueFrom:
      fieldRef:
        apiVersion: v1
        fieldPath: metadata.uid
  - name: OTEL_RESOURCE_ATTRIBUTES
    value: k8s.pod.uid=$(POD_UID)

If the OTEL_RESOURCE_ATTRIBUTES env var is already set simply add the k8s.pod.uid, using a comma as separator. The value is a comma-separated list.

Node.js application on Google Kubernetes Engine

The Node.js SDK, only on GKE, expects that the Kubernetes namespace is set via the NAMESPACE environment variable. If it is not set it will still add the k8s.namespace.name attribute but with an empty value. This prevents the Kubernetes attributes processor from inserting the correct namespace name. Until this is fixed a workaround is to update your pod spec and add this environment variable to the instrumented container(s):

env:
  - name: NAMESPACE
    valueFrom:
      fieldRef:
        apiVersion: v1
        fieldPath: metadata.namespace

No metrics available for Node.js application

Kubernetes attributes cannot be added

During the installation of the collector, a cluster role and cluster role binding are created in Kubernetes that allows the collector to read metadata from Kubernetes resources. If this fails or they get removed the collector will not be able to query the Kubernetes API anymore. This will appear as errors in the collector log, the errors include the resource types for which the metadata could not be retrieved.

To fix this re-install the collector with the Helm chart and make sure you have the required permissions to create the cluster role and cluster role binding. Alternatively, ask your cluster administrator to do the collector installation with the required permissions.

Here <otlp-http-suse-observability-endpoint> is similar to the <otlp-suse-observability-endpoint>, but instead of a otlp- prefix it has otlp-http- prefix, for example, otlp-http-play.stackstate.com. For more details see the .

If the SDK logs errors about not being able to resolve the collector DNS name it may be the configured collector URL is incorrect. In Kubernetes, your application is usually deployed in a separate namespace from the collector. This means that the SDK needs to be configured with the fully qualified domain name for the collector service:http://<service-name>.<namespace>.svc.cluster.local:4317. In the , this was http://opentelemetry-collector.open-telemetry.svc.cluster.local:4317, but if you used a different namespace or release name for the collector this may be different for your situation.

If the SDK logs network connection timeouts it can be that either there is a misconfiguration on the collector or the is used. But it is also possible that Kubernetes network policies are blocking network traffic from your application to the collector. This is best verified with your Kubernetes administrator. Network policies should at least allow TCP traffic on the configured port (4317 and/or 4318) from all of your applications to the collector.

Not all language SDKs have support for gRPC. If OTLP over gRPC is not supported it is best to switch to OTLP over HTTP. The describes how to make this switch.

Using the wrong port usually appears as a connection error but can also show up as network connections being unexpectedly closed. Make sure the SDK exporter is using the right port when sending data. See the .

The auto instrumentation for Node.js, configured via environment variables, only supports traces. At least until this is resolved. To enable metrics from the automatic instrumentation code changes are needed. Please follow the instructions in the to make these changes.

🔭
language-specific SDK documentation
collector documentation
collector installation steps
Open Telemetry issue
Open Telemetry documentation
wrong port
collector configuration
SDK exporter config
SDK exporter config