Auto-instrumentation of Lambdas

SUSE Observability

Introduction

This document guides you through auto-instrumenting NodeJS Lambda functions using OpenTelemetry. Auto-instrumentation simplifies the process of adding observability to your Lambda functions by automatically capturing performance metrics and tracing information.

Prerequisites

Before you begin, ensure you have the following:

  • AWS Lambda function: The function you want to instrument.

  • OpenTelemetry SDK: Installed in your Lambda function.

  • OpenTelemetry Collector: Deployed and configured.

  • SUSE Observability: An account with SUSE Observability where you'll send your telemetry data.

  • Memory: Enough memory to run the Lambda’s including the instrumentation.

Values supplied by the environment

OpenTelemetry relies on various configuration values to function correctly. These values control aspects like data collection, exporting, and communication with backend systems. To make your OpenTelemetry deployment flexible and adaptable to different environments, you can provide these settings through environment variables. This approach offers several benefits:

  • Dynamic Configuration: Easily adjust settings without code changes.

  • Environment-Specific Settings: Configure OpenTelemetry differently for development, testing, and production.

  • Secret Management: Securely store sensitive information like API keys.

For the OpenTelemetry setup described in this documentation, you'll need to define the following environment variables:

  • VERBOSITY: Controls the level of detail in OpenTelemetry logs.

  • OTLP_API_KEY: Authenticates your Lambda function to send data to SUSE Observability.

  • OTLP_ENDPOINT: Specifies the address of your SUSE Observability instance.

  • OPENTELEMETRY_COLLECTOR_CONFIG_FILE: Points to the configuration file for the OpenTelemetry Collector.

  • AWS_LAMBDA_EXEC_WRAPPER: Configures the Lambda execution environment to use the OpenTelemetry handler.

  • OTLP_INSTR_LAYER_ARN: Provides the ARN (Amazon Resource Name) of the OpenTelemetry instrumentation layer, which adds the necessary components for auto-instrumentation.

  • OTLP_COLLECTOR_LAYER_ARN: Provides the ARN of the OpenTelemetry collector layer, which is responsible for receiving, processing, and exporting telemetry data.

Important Considerations:

  • GRPC Endpoint: The OTLP_ENDPOINT should specify the gRPC endpoint of your SUSE Observability instance without any http or https prefix. Use port 443 for secure communication.

  • Region-Specific Layers: Lambda layers are region-bound. Ensure that the ARNs you use for OTLP_INSTR_LAYER_ARN and OTLP_COLLECTOR_LAYER_ARN match the AWS region where your Lambda function is deployed.

  • Architecture Matching: The OpenTelemetry Collector layer is architecture-specific. Choose the correct ARN for your Lambda function's architecture (e.g., amd64 or arm64).

A complete example: be aware you need to input your own values.

VERBOSITY: "normal"
OTLP_API_KEY: "<your api key for sending data to SUSE Observability here>"
OTLP_ENDPOINT: "<your-dns-name-for-suse-observability-here>:443"
OPENTELEMETRY_COLLECTOR_CONFIG_FILE: "/var/task/collector.yaml"
AWS_LAMBDA_EXEC_WRAPPER: "/opt/otel-handler"
OTLP_INSTR_LAYER_ARN: "arn:aws:lambda:<aws-region>:184161586896:layer:opentelemetry-nodejs-0_11_0:1"
OTLP_COLLECTOR_LAYER_ARN: "arn:aws:lambda:<aws-region>:184161586896:layer:opentelemetry-collector-<amd64|arm64>-0_12_0:1"

The collector.yaml file

OTEL collection configuration sets up how the data collected should be distributed. This is done in the collector.yaml file placed in the src directory where the lambda files can be found. Below is an example collector.yaml file.

# collector.yaml in the root directory
# Set an environemnt variable 'OPENTELEMETRY_COLLECTOR_CONFIG_FILE' to
# '/var/task/collector.yaml'

receivers:
  otlp:
    protocols:
      grpc:
      http:

exporters:
  debug:
    verbosity: "${env:VERBOSITY}"
  otlp/stackstate:
    headers:
      Authorization: "SUSEObservability ${env:OTLP_API_KEY}"
    endpoint: "${env:OTLP_ENDPOINT}"

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [debug, otlp/stackstate]
      processors: []
    metrics:
      receivers: [otlp]
      exporters: [debug, otlp/stackstate]
      processors: []

Be aware this collector is used to send the data over to a next collector which then is used for tail sampling, metric aggregation, etc. before sending data over to SUSE Observability. This second collector also needs to run in the customer's environment.

Depending on the desired functionality, or based upon factors such as volumes of data being generated by lambdas instrumented in this way, collectors can be set up for batching, tail-sampling, and other pre-processing techniques to reduce the impact on SUSE Observability.

See this page for guidance and instruction on how to set up a batching collector that acts as a security proxy for SUSE Observability. See this page for instructions on how to set up a collector that does tail-sampling as well. For more information about processor configuration on the opentelemetry collector, see the official documentation.

Package.json

Make sure to add "@opentelemetry/auto-instrumentations-node": "^0.55.2", to package.json and execute npm install to add the auto-instrumentation client libraries to your NodeJS Lambda.

Troubleshooting

Timeouts

If the addition of the OTEL Lambda layers results in lambdas that time out (checking the logs might indicate that the collector was asked to shut down while still busy, e.g. seeing the following log entry):

{
    "level": "info",
    "ts": 1736867469.2312617,
    "caller": "internal/retry_sender.go:126",
    "msg": "Exporting failed. Will retry the request after interval.",
    "kind": "exporter",
    "data_type": "traces",
    "name": "otlp/stackstate",
    "error": "rpc error: code = Canceled desc = context canceled",
    "interval": "5.125929689s"
}

shortly after receiving the instruction to shut down:

{
    "level": "info",
    "ts": 1736867468.4311068,
    "logger": "lifecycle.manager",
    "msg": "Received SHUTDOWN event"
}

The above indicates that the allocated resources of the lambda are not sufficient to allow execution of the lambda and the additional strain added by the OTEL instrumentation. To remedy this, the memory allocation and lambda timeout settings can be adjusted as necessary to allow the lambda to finish its work, while also allowing the telemetry collection to succeed.

Try modifying the MemorySize and TimeOut properties of the lambdas that are failing:

MemorySize: 256
Timeout: 25

Note the default memory allocation is 128MB

Note the memory increment is 128MB

Note Timeout is an integer value denoting seconds.

Authentication and Source IP Filtering

If you encounter error 403 Unauthorized when submitting collector data to your cluster, or to any pre-processing or proxy collector, double-check the source IP address of the VPC NAT gateway matches what is whitelisted by the collector ingress, also double check that the chosen authentication mechanism matches source and destination, and also that credentials (secrets, etc.) are set up correctly.

For more information about configuring authentication for the opentelemetry collector, please refer to the official documentation.

References

Auto-instrumentation docs → https://opentelemetry.io/docs/faas/lambda-auto-instrument/

Collector docs → https://opentelemetry.io/docs/faas/lambda-collector/

GitHub Releases Page for finding latest ARNs → https://github.com/open-telemetry/opentelemetry-lambda/releases

OTLP Exporter Configuration → https://opentelemetry.io/docs/languages/sdk-configuration/otlp-exporter/

Last updated