Search code examples
amazon-web-servicesapache-sparkpysparkamazon-emr

How to properly check resource usage of AWS EMR cluster(master and cores) from notebook


Here are my cluster details:

Master : Running 1 m4.xlarge
Core : Running 3 m4.xlarge
Task : --
Cluster scaling: Not enabled

And I am using notebooks to practice pyspark. And I would like to know how the resources are being utilised, to assess if the resources are being under-utilised or not enough for my tasks. And part of which, when checking RAM/memory usage, here's what I got from terminal:

notebook@ip-xxx-xxx-xxx-xxx ~$ free -h
      total used free shared buff/cache available
Mem:  1.9G  456M 759M 72K    741M       1.4G
Swap: 0B    0B   0B

Each instance of m4.xlarge comes with 16GB of memory. What's happening and why is only two gigs of 16GB being shown? And How do I properly learn how much of my CPU, Memory and Storage are actually being used? (yes, to reduce costs!!)


Solution

  • If you want to check memory and CPU utilization you can check that in CloudWatch with the instance Id.

    • To get the instance id of the node (Hardware -> Instance Group -> Instances).
      You can get detailed metrics of CPU, memory, IO for each node.

    Another option to use resource manager UI (YARN). default url -> http://master-node-ip:8088.

    You can get metrics on job level as well as node level.