Stackstate-Cassandra Integration

Overview

Connect Cassandra to StackState in order to:

  • Visualize the performance of your clusters in real time
  • Correlate the performance of Cassandra with the rest of your applications

For information on JMX Checks, please see here.

Setup

Installation

To capture Cassandra metrics you need to install the StackState Agent. Metrics will be captured using a JMX connection. We recommend the use of Oracle’s JDK for this integration.

This check has a limit of 350 metrics per instance. The number of returned metrics is indicated in the info page. You can specify the metrics you are interested in by editing the configuration below. To learn how to customize the metrics to collect visit the JMX Checks documentation for more detailed instructions. If you need to monitor more metrics, please send us an email at info@stackstate.com

  1. Configure the Agent to connect to Cassandra, just edit conf.d/cassandra.yaml:

    instances:
       -    host: localhost
            port: 7199
            user: username
            password: password
            name: cassandra_instance
            #trust_store_path: /path/to/trustStore.jks # Optional, should be set if ssl is enabled
            #trust_store_password: password
            #java_bin_path: /path/to/java #Optional, should be set if the agent cannot find your java executable
    
    # List of metrics to be collected by the integration
    # Visit http://docs.stackstate.com/integrations/java/ to customize it
    init_config:
      conf:
        - include:
            domain: org.apache.cassandra.metrics
            type: ClientRequest
            scope:
              - Read
              - Write
            name:
              - Latency
              - Timeouts
              - Unavailables
            attribute:
              - Count
              - OneMinuteRate
        - include:
            domain: org.apache.cassandra.metrics
            type: ClientRequest
            scope:
              - Read
              - Write
            name:
              - TotalLatency
        - include:
            domain: org.apache.cassandra.metrics
            type: Storage
            name:
              - Load
              - Exceptions
        - include:
            domain: org.apache.cassandra.metrics
            type: ColumnFamily
            bean_regex:
              - .*keyspace=.*
            name:
              - TotalDiskSpaceUsed
              - BloomFilterDiskSpaceUsed
              - BloomFilterFalsePositives
              - BloomFilterFalseRatio
              - CompressionRatio
              - LiveDiskSpaceUsed
              - LiveSSTableCount
              - MaxRowSize
              - MeanRowSize
              - MemtableColumnsCount
              - MemtableLiveDataSize
              - MemtableSwitchCount
              - MinRowSize
          exclude:
            keyspace:
              - OpsCenter
              - system
              - system_auth
              - system_distributed
              - system_schema
              - system_traces
        - include:
            domain: org.apache.cassandra.metrics
            type: Cache
            name:
              - Capacity
              - Size
            attribute:
              - Value
        - include:
            domain: org.apache.cassandra.metrics
            type: Cache
            name:
              - Hits
              - Requests
            attribute:
              - Count
        - include:
            domain: org.apache.cassandra.metrics
            type: ThreadPools
            path: request
            name:
              - ActiveTasks
              - CompletedTasks
              - PendingTasks
              - CurrentlyBlockedTasks
        - include:
            domain: org.apache.cassandra.db
            attribute:
              - UpdateInterval
    

  2. Restart the Agent

Validation

Execute the info command and verify that the integration check has passed. The output of the command should contain a section similar to the following:

Checks
======

  [...]

  cassandra
  ---------
      - instance #0 [OK]
      - Collected 8 metrics & 0 events

Data Collected

Metrics

cassandra.active_tasks
(gauge)
The number of tasks that the thread pool is actively executing.
shown as task
cassandra.bloom_filter_disk_space_used
(gauge)
Disk space used by the Bloom filters.
shown as byte
cassandra.bloom_filter_false_positives
(gauge)
The number of Bloom filter false positives.
shown as event
cassandra.bloom_filter_false_ratio
(gauge)
The ratio of Bloom filter false positives to total checks.
shown as fraction
cassandra.capacity
(gauge)
The capacity of the caches, such as the key cache and row cache.
shown as byte
cassandra.completed_tasks
(gauge)
The number of tasks that the thread pool has completed.
shown as task
cassandra.compression_ratio
(gauge)
The compression ratio for all SSTables in a column family.
shown as fraction
cassandra.currently_blocked_tasks.count
(gauge)
The number of currently blocked tasks for the thread pool.
shown as task
cassandra.exceptions.count
(gauge)
The number of exceptions thrown.
shown as error
cassandra.hits.count
(gauge)
The number of hits to a cache.
shown as hit
cassandra.latency.count
(gauge)
The number of client requests.
shown as request
cassandra.latency.one_minute_rate
(gauge)
Recent rate of client requests, as an exponentially weighted moving average over a one-minute interval.
shown as request
cassandra.live_disk_space_used.count
(gauge)
Disk space used by "live" SSTables (only counts non-obsolete files).
shown as byte
cassandra.live_ss_table_count
(gauge)
Number of "live" (non-obsolete) SSTables.
shown as file
cassandra.load.count
(gauge)
Disk space used on a node.
shown as byte
cassandra.max_row_size
(gauge)
Size of the largest compacted row.
shown as byte
cassandra.mean_row_size
(gauge)
Average size of compacted rows.
shown as byte
cassandra.memtable_columns_count
(gauge)
Number of columns in memtable.
shown as column
cassandra.memtable_live_data_size
(gauge)
Size of data stored in memtable.
shown as byte
cassandra.memtable_switch_count.count
(gauge)
Number of times a full memtable has been switched out for an empty one due to flushing.
shown as event
cassandra.min_row_size
(gauge)
Size of the smallest compacted row.
shown as byte
cassandra.pending_tasks
(gauge)
The number of pending tasks for the thread pool.
shown as task
cassandra.requests.count
(gauge)
The number of requests to a cache.
shown as request
cassandra.size
(gauge)
Size of cache.
shown as byte
cassandra.timeouts.count
(gauge)
Count of requests not acknowledged within configurable timeout window.
shown as timeout
cassandra.timeouts.one_minute_rate
(gauge)
Recent timeout rate, as an exponentially weighted moving average over a one-minute interval.
shown as timeout
cassandra.total_disk_space_used.count
(gauge)
Disk space used by a column family.
shown as byte
cassandra.total_latency.count
(gauge)
Total latency for all client requests.
shown as microsecond
cassandra.unavailables.count
(gauge)
Count of requests for which the required number of nodes was unavailable.
shown as error
cassandra.unavailables.one_minute_rate
(gauge)
Recent rate of unavailable exceptions, as an exponentially weighted moving average over a one-minute interval.
shown as error
cassandra.db.update_interval
(gauge)
The configurable update interval for the dynamic snitch, which monitors read latency to route requests away from slow nodes.
shown as millisecond
cassandra.db.write_count
(gauge)
The number of local write requests for a column family. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.latency.count instead)
shown as write
cassandra.db.read_count
(gauge)
The number of local read requests for a column family. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.latency.count instead)
shown as read
cassandra.db.live_ss_table_count
(gauge)
Number of "live" (non-obsolete) SSTables. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.live_ss_table_count instead)
shown as file
cassandra.db.total_disk_space_used
(gauge)
Disk space used by a column family. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.total_disk_space_used.count instead)
shown as byte
cassandra.db.memtable_data_size
(gauge)
Size of data stored in memtable. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.memtable_live_data_size instead)
shown as byte
cassandra.internal.currently_blocked_tasks
(gauge)
The number of currently blocked tasks for the thread pool. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.currently_blocked_tasks.count instead)
shown as task
cassandra.db.max_row_size
(gauge)
Size of the largest compacted row. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.max_row_size instead)
shown as byte
cassandra.db.live_disk_space_used
(gauge)
Disk space used by "live" SSTables (only counts non-obsolete files). (Metric may not be available for Cassandra versions > 2.2. Use cassandra.live_disk_space_used.count instead)
shown as byte
cassandra.internal.active_count
(gauge)
The number of tasks that the thread pool is actively executing. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.active_tasks instead)
shown as task
cassandra.internal.completed_tasks
(gauge)
The number of tasks that the thread pool has completed. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.completed_tasks instead)
shown as task
cassandra.db.total_write_latency_micros
(gauge)
Total latency for all write requests. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.total_latency.count instead)
shown as microsecond
cassandra.db.total_read_latency_micros
(gauge)
Total latency for all read requests. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.total_latency.count instead)
shown as microsecond
cassandra.internal.total_blocked_tasks
(gauge)
The cumulative total of currently blocked tasks for the thread pool. (Metric may not be available for Cassandra versions > 2.2.)
shown as task
cassandra.db.mean_row_size
(gauge)
Average size of compacted rows. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.mean_row_size instead)
shown as byte
cassandra.db.compression_ratio
(gauge)
The compression ratio for all SSTables in a column family. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.compression_ratio instead)
shown as fraction
cassandra.db.memtable_switch_count
(gauge)
Number of times a full memtable has been switched out for an empty one due to flushing. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.memtable_switch_count.count instead)
shown as event
cassandra.db.memtable_columns_count
(gauge)
Number of columns in memtable. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.memtable_columns_count instead)
shown as column
cassandra.db.min_row_size
(gauge)
Size of the smallest compacted row. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.min_row_size instead)
shown as byte
cassandra.db.bloom_filter_false_positives
(gauge)
The number of Bloom filter false positives. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.bloom_filter_false_positives instead)
shown as event
cassandra.db.bloom_filter_disk_space_used
(gauge)
Disk space used by the Bloom filters. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.bloom_filter_disk_space_used instead)
shown as byte
cassandra.db.bloom_filter_false_ratio
(gauge)
The ratio of Bloom filter false positives to total checks. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.bloom_filter_false_ratio instead)
shown as fraction
cassandra.net.total_timeouts
(gauge)
Count of requests not acknowledged within configurable timeout window. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.timeouts.count instead)
shown as timeout
cassandra.db.completed_tasks
(gauge)
Completed compaction or commitlog tasks. (Metric may not be available for Cassandra versions > 2.2.)
shown as task
cassandra.db.pending_tasks
(gauge)
Pending compaction, commitlog, or column family tasks. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.pending_tasks instead)
shown as task
cassandra.db.load
(gauge)
Disk space used on a node. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.load.count instead)
shown as byte
cassandra.db.exception_count
(gauge)
The number of exceptions thrown. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.exceptions.count instead)
shown as error
cassandra.db.recent_read_latency_micros
(gauge)
The latency of reads since the last time this attribute was read. (Metric may not be available for Cassandra versions > 2.2.)
shown as microsecond
cassandra.db.recent_write_latency_micros
(gauge)
The latency of writes since the last time this attribute was read. (Metric may not be available for Cassandra versions > 2.2.)
shown as microsecond
cassandra.db.key_cache_recent_hit_rate
(gauge)
Ratio of key cache hits to key cache requests since the last time this attribute was read. (Metric may not be available for Cassandra versions > 2.2.)
shown as fraction
cassandra.db.recent_range_latency_micros
(gauge)
The latency of range scans since the last time this attribute was read. (Metric may not be available for Cassandra versions > 2.2.)
shown as microsecond
cassandra.db.total_range_latency_micros
(gauge)
Total latency for all range scans. (Metric may not be available for Cassandra versions > 2.2.)
shown as microsecond
cassandra.db.write_operations
(gauge)
Count of write operations. (Metric may not be available for Cassandra versions > 2.2.)
shown as operation
cassandra.db.read_operations
(gauge)
Count of read operations. (Metric may not be available for Cassandra versions > 2.2.)
shown as operation
cassandra.db.range_operations
(gauge)
Count of range scan operations. (Metric may not be available for Cassandra versions > 2.2.)
shown as operation