Stackstate-TokuMX Integration

Overview

Capture TokuMX metrics in Stackstate to:

  • Visualize key TokuMX metrics.
  • Correlate TokuMX performance with the rest of your applications.

Installation

  1. Install the Python MongoDB module on your MongoDB server using the following command:

    sudo pip install --upgrade "pymongo<3.0"
    
  2. You can verify that the module is installed using this command:

    python -c "import pymongo" 2>&1 | grep ImportError && \
    echo -e "\033[0;31mpymongo python module - Missing\033[0m" || \
    echo -e "\033[0;32mpymongo python module - OK\033[0m"
    
  3. Start the mongo shell.
  4. Create a read-only admin user for stackstate using the following command. Make sure you replace <UNIQUEPASSWORD> with a unique password for the user. Stackstate needs admin rights to collect complete server statistics.

    use admin
    db.auth("admin", "admin-password")
    db.addUser("stackstate", "<UNIQUEPASSWORD>", true)
    
  5. Verify that you created the user with the following command (not in the mongo shell).

    python -c 'from pymongo import Connection; print Connection().admin.authenticate("stackstate", "<UNIQUEPASSWORD>")' | \
    grep True && \
    echo -e "\033[0;32mstackstate user - OK\033[0m" || \
    echo -e "\033[0;31mstackstate user - Missing\033[0m"
    

For more details about creating and managing users in MongoDB, refer to the MongoDB documentation.

Configuration

Configure the Agent to connect to your TokuMX instance using your new Stackstate user.

  1. Edit the tokumx.yaml file in your Agent’s conf.d directory:

    init_config:
    
    # Specify the MongoDB URI, with database to use for reporting (defaults to "admin")
    # E.g. mongodb://stackstate:LnCbkX4uhpuLHSUrcayEoAZA@localhost:27017/my-db
    instances:
          -   server: mongodb://stackstate:<UNIQUEPASSWORD>@localhost:27017
              tags:
                  - mytag1
                  - mytag2
          -   server: mongodb://stackstate:<UNIQUEPASSWORD>@localhost:27017
              tags:
                  - mytag1
                  - mytag2
    
  2. Restart the Agent.

For more details about configuring this integration refer to the following file(s) on GitHub:

Validation

  1. To validate that your integration is working run the Agent’s info command. You should see output similar to the following:

    Checks
    ======
    
      [...]
    
      tokumx
      ------
          - instance #0 [OK]
          - Collected 8 metrics & 0 events
    

Metrics

tokumx.asserts.msgps
(gauge)
The number of message assertions raised per second.
shown as assertion/second
tokumx.asserts.regularps
(gauge)
The number of regular assertions raised per second.
shown as assertion/second
tokumx.asserts.rolloversps
(gauge)
The number of times that the rollover counters roll over per second. The counters rollover to zero every 2 to the 30 assertions.
shown as assertion/second
tokumx.asserts.userps
(gauge)
The number of user assertions raised per second.
shown as assertion/second
tokumx.asserts.warningps
(gauge)
The number of warnings raised per second.
shown as assertion/second
tokumx.connections.available
(gauge)
The number of unused available incoming connections the database can provide.
shown as connection
tokumx.connections.current
(gauge)
The number of connections to the database server from clients.
shown as connection
tokumx.cursors.timedOut
(gauge)
The total number of cursors that have timed out since the server process started.
shown as cursor
tokumx.cursors.totalOpen
(gauge)
The number of cursors that tokumx is maintaining for clients.
shown as cursor
tokumx.ft.alerts.checkpointFailures
(gauge)
The number of checkpoints that have failed for any reason.
shown as event
tokumx.ft.alerts.locktreeRequestsPending
(gauge)
The number of requests for Document-level Locks in the locktree that are waiting for other requests to release their locks.
shown as request
tokumx.ft.alerts.longWaitEvents.cachePressure.countps
(gauge)
Rate at which a thread had to wait more than 1 second for evictions to create space in the cachetable for it to page in data it needed.
shown as event/second
tokumx.ft.alerts.longWaitEvents.cachePressure.timeps
(gauge)
Fraction of time (microseconds/second) that a thread had to wait more than 1 second for evictions to create space in the cachetable for it to page in data it needed.
shown as fraction
tokumx.ft.alerts.longWaitEvents.checkpointBegin.countps
(gauge)
Rate at which the begin checkpoint phase of checkpoint has run (these should be fairly quick).
shown as event/second
tokumx.ft.alerts.longWaitEvents.checkpointBegin.timeps
(gauge)
Fraction of time (microseconds/second) that a begin checkpoint phase has spent blocking other threads.
shown as fraction
tokumx.ft.alerts.longWaitEvents.fsync.countps
(gauge)
Rate at which fsync operations took more than 1 second.
shown as event/second
tokumx.ft.alerts.longWaitEvents.fsync.timeps
(gauge)
Fraction of time (microseconds/second) spent performing fsync operations that took longer than 1 second.
shown as fraction
tokumx.ft.alerts.longWaitEvents.locktreeWait.countps
(gauge)
Rate at which a thread had to wait more than 1 second to acquire a document-level lock in the locktree.
shown as event/second
tokumx.ft.alerts.longWaitEvents.locktreeWait.timeps
(gauge)
Fraction of time (microseconds/second) spent by threads waiting more than 1 second to acquire a document-level lock in the locktree.
shown as fraction
tokumx.ft.alerts.longWaitEvents.locktreeWaitEscalation.countps
(gauge)
Rate at which a thread had to wait more than 1 second to acquire a document-level lock because the locktree was at the memory limit and needed to run escalation.
shown as event/second
tokumx.ft.alerts.longWaitEvents.locktreeWaitEscalation.timeps
(gauge)
Fraction of time (microseconds/second) spent by threads waiting more than 1 second to acquire a document-level lock because the locktree was at the memory limit and needed to run escalation.
shown as fraction
tokumx.ft.alerts.longWaitEvents.logBufferWaitps
(gauge)
Rate at which a writing client had to wait more than 100ms for access to the log buffer.
shown as event/second
tokumx.ft.cachetable.evictions.full.leaf.clean.bytesps
(gauge)
Rate of full evictions of leaf nodes.
shown as byte/second
tokumx.ft.cachetable.evictions.full.leaf.clean.countps
(gauge)
Rate of full evictions of leaf nodes.
shown as event/second
tokumx.ft.cachetable.evictions.full.leaf.dirty.bytesps
(gauge)
Rate of full evictions of leaf nodes that need to be written back to disk.
shown as byte/second
tokumx.ft.cachetable.evictions.full.leaf.dirty.countps
(gauge)
Rate of full evictions of leaf nodes that need to be written back to disk.
shown as event/second
tokumx.ft.cachetable.evictions.full.leaf.dirty.timeps
(gauge)
Fraction of time (microseconds/second) spent performing full evictions leaf nodes, including the time spent serializing, compressing, and writing those nodes to disk.
shown as fraction
tokumx.ft.cachetable.evictions.full.nonleaf.clean.bytesps
(gauge)
Rate of full evictions of nonleaf nodes.
shown as byte/second
tokumx.ft.cachetable.evictions.full.nonleaf.clean.countps
(gauge)
Rate of full evictions of nonleaf nodes.
shown as event/second
tokumx.ft.cachetable.evictions.full.nonleaf.dirty.bytesps
(gauge)
Rate of full evictions of nonleaf nodes that need to be written back to disk.
shown as byte/second
tokumx.ft.cachetable.evictions.full.nonleaf.dirty.countps
(gauge)
Rate of full evictions of nonleaf nodes that need to be written back to disk.
shown as event/second
tokumx.ft.cachetable.evictions.full.nonleaf.dirty.timeps
(gauge)
Fraction of time (microseconds/second) spent performing full evictions nonleaf nodes, including the time spent serializing, compressing, and writing those nodes to disk.
shown as fraction
tokumx.ft.cachetable.evictions.partial.leaf.clean.bytesps
(gauge)
Rate of partial evictions of leaf nodes.
shown as byte/second
tokumx.ft.cachetable.evictions.partial.leaf.clean.countps
(gauge)
Rate of partial evictions of leaf nodes.
shown as event/second
tokumx.ft.cachetable.evictions.partial.nonleaf.clean.bytesps
(gauge)
Rate of partial evictions of nonleaf nodes.
shown as byte/second
tokumx.ft.cachetable.evictions.partial.nonleaf.clean.countps
(gauge)
Rate of partial evictions of nonleaf nodes.
shown as event/second
tokumx.ft.cachetable.miss.countps
(gauge)
Rate of internal cache misses. This metric is similar to MongoDB’s btree misses and page faults.
shown as miss/second
tokumx.ft.cachetable.miss.full.countps
(gauge)
Rate of full internal cache misses.
shown as miss/second
tokumx.ft.cachetable.miss.full.timeps
(gauge)
Fraction of time (microseconds/second) the database has had to wait for a disk read to complete for a full cache miss.
shown as fraction
tokumx.ft.cachetable.miss.partial.countps
(gauge)
Rate of partial internal cache misses.
shown as miss/second
tokumx.ft.cachetable.miss.partial.timeps
(gauge)
Fraction of time (microseconds/second) the database has had to wait for a disk read to complete for a partial cache miss.
shown as fraction
tokumx.ft.cachetable.miss.timeps
(gauge)
Fraction of time (microseconds/second) the database has had to wait for a disk read to complete for cache misses.
shown as fraction
tokumx.ft.cachetable.size.current
(gauge)
Total amount of uncompressed data currently in the database's internal cache.
shown as byte
tokumx.ft.cachetable.size.limit
(gauge)
Total amount of uncompressed data that will fit in TokuMX’s internal cache.
shown as byte
tokumx.ft.cachetable.size.writing
(gauge)
Total size of nodes that are currently queued up to be written to disk for eviction.
shown as byte
tokumx.ft.checkpoint.begin.timeps
(gauge)
Fraction of time (microseconds/second) that a begin checkpoint phase has spent blocking other threads.
shown as fraction
tokumx.ft.checkpoint.countps
(gauge)
Rate at which checkpoints are completed.
shown as event/second
tokumx.ft.checkpoint.lastComplete.time
(gauge)
The time spent, in seconds, by the most recently completed checkpoint.
shown as second
tokumx.ft.checkpoint.timeps
(gauge)
Fraction of time (seconds/second) spent doing checkpoints.
shown as fraction
tokumx.ft.checkpoint.write.leaf.bytes.compressedps
(gauge)
The rate at which leaf nodes are written to disk during checkpoints, after compression.
shown as byte/second
tokumx.ft.checkpoint.write.leaf.bytes.uncompressedps
(gauge)
The rate at which leaf nodes are written to disk during checkpoints, before compression.
shown as byte/second
tokumx.ft.checkpoint.write.leaf.countps
(gauge)
The rate at which leaf nodes are written to disk during checkpoints.
shown as write/second
tokumx.ft.checkpoint.write.leaf.timeps
(gauge)
The fraction of time spent writing leaf nodes to disk during checkpoints.
shown as fraction
tokumx.ft.checkpoint.write.nonleaf.bytes.compressedps
(gauge)
The rate at which nonleaf nodes are written to disk during checkpoints, after compression.
shown as byte/second
tokumx.ft.checkpoint.write.nonleaf.bytes.uncompressedps
(gauge)
The rate at which nonleaf nodes are written to disk during checkpoints, before compression.
shown as byte/second
tokumx.ft.checkpoint.write.nonleaf.countps
(gauge)
The rate at which nonleaf nodes are written to disk during checkpoints.
shown as write/second
tokumx.ft.checkpoint.write.nonleaf.timeps
(gauge)
The fraction of time spent writing nonleaf nodes to disk during checkpoints.
shown as fraction
tokumx.ft.compressionRatio.leaf
(gauge)
The size ratio of leaf nodes before and after compression.
shown as fraction
tokumx.ft.compressionRatio.nonleaf
(gauge)
The size ratio of nonleaf nodes before and after compression.
shown as fraction
tokumx.ft.compressionRatio.overall
(gauge)
The size ratio of nodes before and after compression.
shown as fraction
tokumx.ft.fsync.countps
(gauge)
The rate at which the database flushed the operating system’s file buffers to disk.
shown as operation/second
tokumx.ft.fsync.timeps
(gauge)
The fraction of time (microseconds/second) used to fsync to disk.
shown as fraction
tokumx.ft.locktree.size.current
(gauge)
Total memory the locktree is currently using.
shown as byte
tokumx.ft.locktree.size.limit
(gauge)
Maximum number of bytes that the locktree is allowed to use.
shown as byte
tokumx.ft.log.bytesps
(gauge)
The rate at which the logger writes to disk.
shown as byte/second
tokumx.ft.log.countps
(gauge)
The rate of of individual log writes.
shown as write/second
tokumx.ft.log.timeps
(gauge)
The fraction of time spent performing log writes.
shown as fraction
tokumx.ft.serializeTime.leaf.compressps
(gauge)
Fraction of time spent compressing leaf nodes before writing them to disk (for checkpoint or when evicted while dirty).
shown as fraction
tokumx.ft.serializeTime.leaf.decompressps
(gauge)
Fraction of time spent decompressing leaf nodes before writing them to disk (for checkpoint or when evicted while dirty).
shown as fraction
tokumx.ft.serializeTime.leaf.deserializeps
(gauge)
Fraction of time spent deserializing leaf nodes and their partitions after reading them off disk.
shown as fraction
tokumx.ft.serializeTime.leaf.serializeps
(gauge)
Fraction of time spent serializing leaf nodes and their partitions after reading them off disk.
shown as fraction
tokumx.ft.serializeTime.nonleaf.compressps
(gauge)
Fraction of time spent compressing nonleaf nodes before writing them to disk (for checkpoint or when evicted while dirty).
shown as fraction
tokumx.ft.serializeTime.nonleaf.decompressps
(gauge)
Fraction of time spent decompressing nonleaf nodes before writing them to disk (for checkpoint or when evicted while dirty).
shown as fraction
tokumx.ft.serializeTime.nonleaf.deserializeps
(gauge)
Fraction of time spent deserializing nonleaf nodes and their partitions after reading them off disk.
shown as fraction
tokumx.ft.serializeTime.nonleaf.serializeps
(gauge)
Fraction of time spent serializing nonleaf nodes and their partitions after reading them off disk.
shown as fraction
tokumx.mem.resident
(gauge)
The amount of memory currently used by the database process.
shown as mebibyte
tokumx.mem.virtual
(gauge)
The amount of virtual memory used by the database process.
shown as mebibyte
tokumx.metrics.document.deletedps
(gauge)
The number of documents deleted per second.
shown as document/second
tokumx.metrics.document.insertedps
(gauge)
The number of documents inserted per second.
shown as document/second
tokumx.metrics.document.returnedps
(gauge)
The number of documents returned by queries per second.
shown as document/second
tokumx.metrics.document.updatedps
(gauge)
The number of documents updated per second.
shown as document/second
tokumx.metrics.getLastError.wtime.numps
(gauge)
The number of getLastError operations per second with a specified write concern (i.e. w) that wait for one or more members of a replica set to acknowledge the write operation.
shown as operation/second
tokumx.metrics.getLastError.wtime.totalMillisps
(gauge)
The number of times per second that write concern operations have timed out as a result of the wtimeout threshold to getLastError.
shown as event/second
tokumx.metrics.getLastError.wtimeoutsps
(gauge)
The fraction of time (ms/s) spent performing getLastError operations with write concern (i.e. w) that wait for one or more members of a replica set to acknowledge the write operation.
shown as fraction
tokumx.metrics.operation.idhackps
(gauge)
The rate of queries that contain the _id field.
shown as query/second
tokumx.metrics.operation.scanAndOrderps
(gauge)
The rate of queries that return sorted numbers that cannot perform the sort operation using an index.
shown as query/second
tokumx.metrics.queryExecutor.scannedps
(gauge)
The rate of index items scanned during queries and query-plan evaluation.
shown as operation/second
tokumx.metrics.repl.apply.batches.numps
(gauge)
The number of batches applied across all databases per second.
shown as operation/second
tokumx.metrics.repl.apply.batches.totalMillisps
(gauge)
The fraction of time (ms/s) spent applying operations from the oplog.
shown as fraction
tokumx.metrics.repl.apply.opsps
(gauge)
The rate of oplog operations.
shown as operation/second
tokumx.metrics.repl.buffer.count
(gauge)
The number of operations in the oplog buffer.
shown as operation
tokumx.metrics.repl.buffer.sizeBytes
(gauge)
The current size of the contents of the oplog buffer.
shown as byte
tokumx.metrics.repl.network.bytesps
(gauge)
The rate at which data is read from the replication sync source.
shown as byte/second
tokumx.metrics.repl.network.getmores.numps
(gauge)
The rate of getmore operations.
shown as operation/second
tokumx.metrics.repl.network.getmores.totalMillisps
(gauge)
The fraction of time (ms/s) spent collecting data from getmore operations.
shown as fraction
tokumx.metrics.repl.network.opsps
(gauge)
The rate of operations read from the replication source.
shown as operation/second
tokumx.metrics.repl.network.readersCreatedps
(gauge)
The rate at which oplog query processes are created.
shown as process/second
tokumx.metrics.repl.oplog.insert.numps
(gauge)
The rate at which operations are inserted into the oplog.
shown as operation/second
tokumx.metrics.repl.oplog.insert.totalMillisps
(gauge)
The fraction of time (ms/s) spent inserting operations into the oplog.
shown as fraction
tokumx.metrics.repl.oplog.insertBytesps
(gauge)
The rate (in bytes) at which data is inserted into the oplog.
shown as byte/second
tokumx.metrics.ttl.deletedDocumentsps
(gauge)
The rate at which documents are deleted from collections with a ttl index.
shown as document/second
tokumx.metrics.ttl.passesps
(gauge)
The number of times per second the background process removes documents from collections with a ttl index.
shown as event/second
tokumx.opcounters.commandps
(gauge)
The total number of commands per second issued to the database.
shown as command/second
tokumx.opcounters.deleteps
(gauge)
The number of delete operations per second.
shown as operation/second
tokumx.opcounters.getmoreps
(gauge)
The number of getmore operations per second.
shown as operation/second
tokumx.opcounters.insertps
(gauge)
The number of insert operations per second.
shown as operation/second
tokumx.opcounters.queryps
(gauge)
The total number of queries per second.
shown as query/second
tokumx.opcounters.updateps
(gauge)
The number of update operations per second.
shown as operation/second
tokumx.opcountersRepl.commandps
(gauge)
The total number of replicated commands issued to the database per second.
shown as command/second
tokumx.opcountersRepl.deleteps
(gauge)
The number of replicated delete operations per second.
shown as operation/second
tokumx.opcountersRepl.getmoreps
(gauge)
The number of replicated getmore operations per second.
shown as operation/second
tokumx.opcountersRepl.insertps
(gauge)
The number of replicated insert operations per second.
shown as operation/second
tokumx.opcountersRepl.queryps
(gauge)
The total number of replicated queries per second.
shown as query/second
tokumx.opcountersRepl.updateps
(gauge)
The number of replicated update operations per second.
shown as operation/second
tokumx.stats.coll.count
(gauge)
The number of objects or documents in this collection.
shown as document
tokumx.stats.coll.nindexes
(gauge)
The number of indexes on this collection.
shown as index
tokumx.stats.coll.nindexesbeingbuilt
(gauge)
The number of indexes currently being built.
shown as index
tokumx.stats.coll.size
(gauge)
The total size in memory of all records in a collection. Does not include the record header, but does include the record’s padding. Does not include the size of any indexes associated with the collection.
shown as byte
tokumx.stats.coll.storageSize
(gauge)
The total amount of storage allocated to this collection for document storage.
shown as byte
tokumx.stats.coll.totalIndexSize
(gauge)
The total size of all indexes on this collection.
shown as byte
tokumx.stats.coll.totalIndexStorageSize
(gauge)
The total size on disk of all indexes on this collection (after compression).
shown as byte
tokumx.stats.dataSize
(gauge)
The total size of the data held in this database including the padding factor.
shown as byte
tokumx.stats.db.avgObjSize
(gauge)
The average size of each document.
shown as byte
tokumx.stats.db.collections
(gauge)
The number of collections in the database.
tokumx.stats.db.dataSize
(gauge)
The total size of the data held in this database including the padding factor.
shown as byte
tokumx.stats.db.indexes
(gauge)
The total number of indexes across all collections in the database.
shown as index
tokumx.stats.db.indexSize
(gauge)
The total size of all indexes created on this database.
shown as byte
tokumx.stats.db.indexStorageSize
(gauge)
The total size on disk of all indexes created on this database (after compression).
shown as byte
tokumx.stats.db.objects
(gauge)
The number of documents in the database across all collections.
shown as document
tokumx.stats.db.storageSize
(gauge)
The total amount of space allocated to collections in this database for document storage.
shown as byte
tokumx.stats.idx.avgObjSize
(gauge)
The average size of each index entry.
shown as byte
tokumx.stats.idx.count
(gauge)
The number of documents in this index.
shown as index
tokumx.stats.idx.deletes
(gauge)
The number of delete operations performed on this index.
shown as operation
tokumx.stats.idx.inserts
(gauge)
The number of insert operations performed on this index.
shown as operation
tokumx.stats.idx.nscanned
(gauge)
The number of index entries scanned for queries using this index.
shown as index
tokumx.stats.idx.nscannedObjects
(gauge)
The number of collection objects examined after scanning an index entry for a query using this index.
shown as object
tokumx.stats.idx.queries
(gauge)
The number of query operations performed using this index.
shown as query
tokumx.stats.idx.size
(gauge)
The total size of this index.
shown as byte
tokumx.stats.idx.storageSize
(gauge)
The total size on disk of this index (after compression).
shown as byte
tokumx.stats.indexes
(gauge)
The total number of indexes across all collections in the database.
shown as index
tokumx.stats.indexSize
(gauge)
The total size of all indexes created on this database.
shown as byte
tokumx.stats.objects
(gauge)
The number of documents in the database across all collections.
shown as document
tokumx.stats.storageSize
(gauge)
The total amount of space allocated to collections in this database for document storage.
shown as byte
tokumx.uptime
(gauge)
The time that the tokumx process has been active.
shown as second