Stackstate-Mesos Integration

Connects Mesos to Stackstate in order to:

  • Visualize your Mesos cluster performance
  • Correlate the performance of Mesos with the rest of your applications

StackState is able to:

  • Find all tasks (including containers)
  • Correlate all tasks (based on connectivity between them)
  • Report metrics for a Mesos slave and a Mesos master
  • Correlate all services running in Mesos with other components in your IT stack
  • Parse application or service logs to extract and ship relevant events to StackState

For more details about configuring this integration refer to the following file(s) on GitHub:

Metrics

mesos.framework.cpu
(gauge)
Framework cpu
mesos.framework.mem
(gauge)
Framework mem
shown as mebibyte
mesos.framework.disk
(gauge)
Framework disk
shown as mebibyte
mesos.role.cpu
(gauge)
Role cpu
mesos.role.mem
(gauge)
Role mem
shown as mebibyte
mesos.role.disk
(gauge)
Role disk
shown as mebibyte
mesos.cluster.tasks_error
(gauge)
Number of tasks that were invalid
shown as task
mesos.cluster.tasks_failed
(count)
Number of failed tasks
shown as task
mesos.cluster.tasks_finished
(count)
Number of finished tasks
shown as task
mesos.cluster.tasks_killed
(count)
Number of killed tasks
shown as task
mesos.cluster.tasks_lost
(count)
Number of lost tasks
shown as task
mesos.cluster.tasks_running
(gauge)
Number of running tasks
shown as task
mesos.cluster.tasks_staging
(gauge)
Number of staging tasks
shown as task
mesos.cluster.tasks_starting
(gauge)
Number of starting tasks
shown as task
mesos.cluster.slave_registrations
(gauge)
Number of slaves that were able to cleanly re-join the cluster and connect back to the master after the master is disconnected.
mesos.cluster.slave_removals
(gauge)
Number of slaves removed for various reasons, including maintenance
mesos.cluster.slave_reregistrations
(gauge)
Number of slave re-registrations
mesos.cluster.slave_shutdowns_canceled
(gauge)
Number of cancelled slave shutdowns
mesos.cluster.slave_shutdowns_scheduled
(gauge)
Number of slaves which have failed their health check and are scheduled to be removed
mesos.cluster.slaves_active
(gauge)
Number of active slaves
mesos.cluster.slaves_connected
(gauge)
Number of connected slaves
mesos.cluster.slaves_disconnected
(gauge)
Number of disconnected slaves
mesos.cluster.slaves_inactive
(gauge)
Number of inactive slaves
mesos.cluster.cpus_percent
(gauge)
Percentage of allocated CPUs
shown as percent
mesos.cluster.cpus_used
(gauge)
Number of allocated CPUs
mesos.cluster.cpus_total
(gauge)
Number of CPUs
mesos.cluster.disk_percent
(gauge)
Percentage of allocated disk space
shown as percent
mesos.cluster.disk_used
(gauge)
Allocated disk space
shown as mebibyte
mesos.cluster.disk_total
(gauge)
Disk space
shown as mebibyte
mesos.cluster.mem_percent
(gauge)
Percentage of allocated memory
shown as percent
mesos.cluster.mem_used
(gauge)
Allocated memory
shown as mebibyte
mesos.cluster.mem_total
(gauge)
Total memory
shown as mebibyte
mesos.registrar.queued_operations
(gauge)
Number of queued operations
mesos.registrar.registry_size_bytes
(gauge)
Registry size
shown as byte
mesos.registrar.state_fetch_ms
(gauge)
Registry read latency
shown as millisecond
mesos.registrar.state_store_ms
(gauge)
Registry write latency
shown as millisecond
mesos.registrar.state_store_ms.count
(gauge)
Registry write count
mesos.registrar.state_store_ms.max
(gauge)
Maximum registry write latency
shown as millisecond
mesos.registrar.state_store_ms.min
(gauge)
Minimum registry write latency
shown as millisecond
mesos.registrar.state_store_ms.p50
(gauge)
Median registry write latency
shown as millisecond
mesos.registrar.state_store_ms.p90
(gauge)
90th percentile registry write latency
shown as millisecond
mesos.registrar.state_store_ms.p95
(gauge)
95th percentile registry write latency
shown as millisecond
mesos.registrar.state_store_ms.p99
(gauge)
99th percentile registry write latency
shown as millisecond
mesos.registrar.state_store_ms.p999
(gauge)
99.9th percentile registry write latency
shown as millisecond
mesos.registrar.state_store_ms.p9999
(gauge)
99.99th percentile registry write latency
shown as millisecond
mesos.cluster.frameworks_active
(gauge)
Number of active frameworks
mesos.cluster.frameworks_connected
(gauge)
Number of connected frameworks
mesos.cluster.frameworks_disconnected
(gauge)
Number of disconnected frameworks
mesos.cluster.frameworks_inactive
(gauge)
Number of inactive frameworks
mesos.stats.system.cpus_total
(gauge)
Number of CPUs available
mesos.stats.system.load_15min
(gauge)
Load average for the past 15 minutes
mesos.stats.system.load_1min
(gauge)
Load average for the past minutes
mesos.stats.system.load_5min
(gauge)
Load average for the past 5 minutes
mesos.stats.system.mem_free_bytes
(gauge)
Free memory
shown as byte
mesos.stats.system.mem_total_bytes
(gauge)
Total memory
shown as byte
mesos.stats.elected
(gauge)
Whether this is the elected master
mesos.stats.uptime_secs
(gauge)
Uptime
shown as second
mesos.cluster.dropped_messages
(gauge)
Number of dropped messages
shown as message
mesos.cluster.outstanding_offers
(gauge)
Number of outstanding resource offers
mesos.cluster.event_queue_dispatches
(gauge)
Number of dispatches in the event queue
mesos.cluster.event_queue_http_requests
(gauge)
Number of HTTP requests in the event queue
shown as request
mesos.cluster.event_queue_messages
(gauge)
Number of messages in the event queue
shown as message
mesos.cluster.invalid_framework_to_executor_messages
(gauge)
Number of invalid framework messages
shown as message
mesos.cluster.invalid_status_update_acknowledgements
(gauge)
Number of invalid status update acknowledgements
mesos.cluster.invalid_status_updates
(gauge)
Number of invalid status updates
mesos.cluster.valid_framework_to_executor_messages
(gauge)
Number of valid framework messages
shown as message
mesos.cluster.valid_status_update_acknowledgements
(gauge)
Number of valid status update acknowledgements
mesos.cluster.valid_status_updates
(gauge)
Number of valid status updates
mesos.state.task.cpu
(gauge)
Task cpu
mesos.state.task.mem
(gauge)
Task memory
shown as mebibyte
mesos.state.task.disk
(gauge)
Task disk
shown as mebibyte
mesos.slave.tasks_failed
(count)
Number of failed tasks
shown as task
mesos.slave.tasks_finished
(count)
Number of finished tasks
shown as task
mesos.slave.tasks_killed
(count)
Number of killed tasks
shown as task
mesos.slave.tasks_lost
(count)
Number of lost tasks
shown as task
mesos.slave.tasks_running
(gauge)
Number of running tasks
shown as task
mesos.slave.tasks_staging
(gauge)
Number of staging tasks
shown as task
mesos.slave.tasks_starting
(gauge)
Number of starting tasks
shown as task
mesos.stats.system.cpus_total
(gauge)
Number of CPUs available
mesos.stats.system.load_15min
(gauge)
Load average for the past 15 minutes
mesos.stats.system.load_1min
(gauge)
Load average for the past minutes
mesos.stats.system.load_5min
(gauge)
Load average for the past 5 minutes
mesos.stats.system.mem_free_bytes
(gauge)
Free memory
shown as byte
mesos.stats.system.mem_total_bytes
(gauge)
Total memory
shown as byte
mesos.stats.registered
(gauge)
Whether this slave is registered with a master
mesos.stats.uptime_secs
(gauge)
Slave uptime
mesos.slave.cpus_percent
(gauge)
Percentage of allocated CPUs
shown as percent
mesos.slave.cpus_used
(gauge)
Number of allocated CPUs
mesos.slave.cpus_total
(gauge)
Number of CPUs
mesos.slave.disk_percent
(gauge)
Percentage of allocated disk space
shown as percent
mesos.slave.disk_used
(gauge)
Allocated disk space
shown as mebibyte
mesos.slave.disk_total
(gauge)
Disk space
shown as mebibyte
mesos.slave.mem_percent
(gauge)
Percentage of allocated memory
shown as percent
mesos.slave.mem_used
(gauge)
Allocated memory
shown as mebibyte
mesos.slave.mem_total
(gauge)
Total memory
shown as mebibyte
mesos.slave.executors_registering
(gauge)
Number of executors registering
mesos.slave.executors_running
(gauge)
Number of executors running
mesos.slave.executors_terminated
(gauge)
Number of terminated executors
mesos.slave.executors_terminating
(gauge)
Number of terminating executors
mesos.slave.frameworks_active
(gauge)
Number of active frameworks
mesos.slave.invalid_framework_messages
(gauge)
Number of invalid framework messages
shown as message
mesos.slave.invalid_status_updates
(gauge)
Number of invalid status updates
mesos.slave.recovery_errors
(gauge)
Number of errors encountered during slave recovery
shown as error
mesos.slave.valid_framework_messages
(gauge)
Number of valid framework messages
shown as message
mesos.slave.valid_status_updates
(gauge)
Number of valid status updates