Search code examples
apache-sparkmesosmarathon

How to find mesos active resource usage per framework?


We are using Mesos 1.20 + Marathon 1.4.3 to run the SparkJob. I am trying to use an algorithm to forecast the job resource usage to achieve the auto-scale up/down. I can see the dynamic resource usage per framework in Mesos web page at http://:5050/#/agents/. However looks like from endpoint, I can only get the usage per slave, such as in below link:

finding active framework current resource usage in mesos

Is there any way through Mesos endpoint I can get the snapshot resource usage per each framework?

I tried this endpoint in mesos slave as well, looks like no cpu/memory information per framework either.

http://agent-ip:5051/metrics/snapshot/slave(1)/monitor/statistics

{
  "slave/executors_terminated": 114751.0,
  "slave/tasks_finished": 63594.0,
  "slave/cpus_total": 8.0,
  "slave/executors_preempted": 0.0,
  "slave/cpus_percent": 1.0125,
  "slave/executors_running": 8.0,
  "slave/gpus_revocable_used": 0.0,
  "slave/invalid_status_updates": 256.0,
  "slave/executors_registering": 0.0,
  "slave/tasks_gone": 0.0,
  "slave/cpus_revocable_percent": 0.0,
  "slave/gpus_total": 0.0,
  "slave/tasks_killed": 50763.0,
  "slave/tasks_starting": 0.0,
  "slave/registered": 1.0,
  "slave/gpus_revocable_total": 0.0,
....
}

Thanks


Solution

  • To gather this information you need to query each agent /slave/monitor/statistics/ endpoint and collect all executors metrics and group executor metrics by its framework id.


    Here is a Diamond Mesos Collector that do this but it collect only single agent data. You can group them in your metric visualization tool e.g. Grafana.