Search code examples
mesosmesosphere

finding active framework current resource usage in mesos


Which HTTP endpoint will help me to find all the active frameworks current resource utilization?

We want this information because we want to dynamically scale Mesos cluster and our algorithm needs information regarding what resources each active framework is using.


Solution

  • I think to focus on the frameworks is not really what you would want to to. What you're after is probably the Mesos Slave utilization, which can be requested via calling

    http://{mesos-master}:5050/master/state-summary
    

    In the JSON answer, you'll find a slaves property which contains an array of slave objects:

    {
        "hostname": "192.168.0.3",
        "cluster": "mesos-hw-cluster",
        "slaves": [{
            "id": "bd9c29d7-8530-4c5b-8c50-5d2f60dffbf6-S2",
            "pid": "slave(1)@192.168.0.1:5051",
            "hostname": "192.168.0.1",
            "registered_time": 1456826950.99075,
            "resources": {
                "cpus": 12.0,
                "disk": 1840852.0,
                "mem": 63304.0,
                "ports": "[31000-32000]"
            },
            "used_resources": {
                "cpus": 5.75,
                "disk": 0.0,
                "mem": 14376.0,
                "ports": "[31000-31000, 31109-31109, 31267-31267, 31699-31699, 31717-31717, 31907-31907, 31979-31980]"
            },
            "offered_resources": {
                "cpus": 0.0,
                "disk": 0.0,
                "mem": 0.0
            },
            "reserved_resources": {},
            "unreserved_resources": {
                "cpus": 12.0,
                "disk": 1840852.0,
                "mem": 63304.0,
                "ports": "[31000-32000]"
            },
            "attributes": {},
            "active": true,
            "version": "0.27.1",
            "TASK_STAGING": 0,
            "TASK_STARTING": 0,
            "TASK_RUNNING": 7,
            "TASK_FINISHED": 18,
            "TASK_KILLED": 27,
            "TASK_FAILED": 3,
            "TASK_LOST": 0,
            "TASK_ERROR": 0,
            "framework_ids": ["bd9c29d7-8530-4c5b-8c50-5d2f60dffbf6-0000", "bd9c29d7-8530-4c5b-8c50-5d2f60dffbf6-0002"]
        },
        ...
    }
    

    You could iterate over all the slave objects and calculate the overall ressource usage by summarizing the resources and then subtract the summary of the used_resources.

    See