Stackstate-Spark Integration

Overview

Get metrics from your app in real time to

  • Visualize performance metrics

Configuration

Install Stackstate Agent on the Master Node where the ResourceManager is running

  1. Configure the agent to connect to the ResourceManager. Edit conf.d/spark.yaml

    init_config:
    
    instances:
      #
      # The Spark check retrieves metrics from YARN's ResourceManager. This
      # check must be run from the Master Node and the ResourceManager URI must
      # be specified below. The ResourceManager URI is composed of the
      # ResourceManager's hostname and port.
      #
      # The ResourceManager hostname can be found in the yarn-site.xml conf file
      # under the property yarn.resourcemanager.address
      #
      # The ResourceManager port can be found in the yarn-site.xml conf file under
      # the property yarn.resourcemanager.webapp.address
      #
      - resourcemanager_uri: http://localhost:8088
    
      # An optional friendly name can be specified for the cluster.
      #   cluster_name: MySparkCluster
    
        # Additional tags can be specified for the metrics.
        # tags:
        #   - optional_tag1
        #   - optional_tag2
    

For more details about configuring this integration refer to the following file(s) on GitHub:

Validation

  1. Restart the Agent
  2. Execute the info command and verify that the integration check passed. The output of the command should contain a section similar to the following:

    Checks
    ======
    
      [...]
    
      spark
      -----
        - instance #0 [OK]
        - Collected 0 metrics, 0 events & 2 service checks
    

Metrics

spark.job.num_tasks
(rate)
Number of tasks in the application
shown as task/second
spark.job.num_active_tasks
(rate)
Number of active tasks in the application
shown as task/second
spark.job.num_skipped_tasks
(rate)
Number of skipped tasks in the application
shown as task/second
spark.job.num_failed_tasks
(rate)
Number of failed tasks in the application
shown as task/second
spark.job.num_active_stages
(rate)
Number of active stages in the application
shown as stage/second
spark.job.num_completed_stages
(rate)
Number of completed stages in the application
shown as stage/second
spark.job.num_skipped_stages
(rate)
Number of skipped stages in the application
shown as stage/second
spark.job.num_failed_stages
(rate)
Number of failed stages in the application
shown as stage/second
spark.stage.num_active_tasks
(rate)
Number of active tasks in the application's stages
shown as task/second
spark.stage.num_complete_tasks
(rate)
Number of complete tasks in the application's stages
shown as task/second
spark.stage.num_failed_tasks
(rate)
Number of failed tasks in the application's stages
shown as task/second
spark.stage.executor_run_time
(gauge)
Fraction of time (ms/s) spent by the executor in the application's stages
shown as fraction
spark.stage.input_bytes
(rate)
Input bytes in the application's stages
shown as byte/second
spark.stage.input_records
(rate)
Input records in the application's stages
shown as record/second
spark.stage.output_bytes
(rate)
Output bytes in the application's stages
shown as byte/second
spark.stage.output_records
(rate)
Output records in the application's stages
shown as record/second
spark.stage.shuffle_read_bytes
(rate)
Number of bytes read during a shuffle in the application's stages
shown as byte/second
spark.stage.shuffle_read_records
(rate)
Number of records read during a shuffle in the application's stages
shown as record/second
spark.stage.shuffle_write_bytes
(rate)
Number of shuffled bytes in the application's stages
shown as byte/second
spark.stage.shuffle_write_records
(rate)
Number of shuffled records in the application's stages
shown as record/second
spark.stage.memory_bytes_spilled
(rate)
Number of bytes spilled to disk in the application's stages
shown as byte/second
spark.stage.disk_bytes_spilled
(rate)
Max size on disk of the spilled bytes in the application's stages
shown as byte/second
spark.driver.rdd_blocks
(rate)
Number of RDD blocks in the driver
shown as block/second
spark.driver.memory_used
(rate)
Amount of memory used in the driver
shown as byte/second
spark.driver.disk_used
(rate)
Amount of disk used in the driver
shown as byte/second
spark.driver.active_tasks
(rate)
Number of active tasks in the driver
shown as task/second
spark.driver.failed_tasks
(rate)
Number of failed tasks in the driver
shown as task/second
spark.driver.completed_tasks
(rate)
Number of completed tasks in the driver
shown as task/second
spark.driver.total_tasks
(rate)
Number of total tasks in the driver
shown as task/second
spark.driver.total_duration
(gauge)
Fraction of time (ms/s) spent by the driver
shown as fraction
spark.driver.total_input_bytes
(rate)
Number of input bytes in the driver
shown as byte/second
spark.driver.total_shuffle_read
(rate)
Number of bytes read during a shuffle in the driver
shown as byte/second
spark.driver.total_shuffle_write
(rate)
Number of shuffled bytes in the driver
shown as byte/second
spark.driver.max_memory
(rate)
Maximum memory used in the driver
shown as byte/second
spark.executor.rdd_blocks
(rate)
Number of persisted RDD blocks in the application's executors
shown as block/second
spark.executor.memory_used
(rate)
Amount of memory used for cached RDDs in the application's executors
shown as byte/second
spark.executor.disk_used
(rate)
Amount of disk space used by persisted RDDs in the application's executors
shown as byte/second
spark.executor.active_tasks
(rate)
Number of active tasks in the application's executors
shown as task/second
spark.executor.failed_tasks
(rate)
Number of failed tasks in the application's executors
shown as task/second
spark.executor.completed_tasks
(rate)
Number of completed tasks in the application's executors
shown as task/second
spark.executor.total_tasks
(rate)
Total number of tasks in the application's executors
shown as task/second
spark.executor.total_duration
(gauge)
Fraction of time (ms/s) spent by the application's executors executing tasks
shown as fraction
spark.executor.total_input_bytes
(rate)
Total number of input bytes in the application's executors
shown as byte/second
spark.executor.total_shuffle_read
(rate)
Total number of bytes read during a shuffle in the application's executors
shown as byte/second
spark.executor.total_shuffle_write
(rate)
Total number of shuffled bytes in the application's executors
shown as byte/second
spark.executor_memory
(rate)
Maximum memory available for caching RDD blocks in the application's executors
shown as byte/second
spark.rdd.num_partitions
(rate)
Number of persisted RDD partitions in the application/second
spark.rdd.num_cached_partitions
(rate)
Number of in-memory cached RDD partitions in the application/second
spark.rdd.memory_used
(rate)
Amount of memory used in the application's persisted RDDs
shown as byte/second
spark.rdd.disk_used
(rate)
Amount of disk space used by persisted RDDs in the application
shown as byte/second