obtaining number of nodes, number of codes and available RAM for tuning

I am trying to tune my HPC cluster (I use Sparklyr) and I try to collect some important specs specified by http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/:

To hopefully make all of this a little more concrete, here’s a worked example of configuring a Spark app to use as much of the cluster as possible: Imagine a cluster with six nodes running NodeManagers, each equipped with 16 cores and 64GB of memory.

namely:

number of nodes
number of cores
disk space and RAM

I know how to use sinfo -n -l but I see too many cores and I cannot easily get this information. Is there a simpler way to know the overall specs of my cluster?

Ultimately, I am trying to find some reasonable parameters for --num-executors --executor-cores and --executor-memory

Solution

Number of nodes:

sinfo -O "nodes" --noheader

Number of cores: Slurm's "cores" are, by default, the number of cores per socket, not the total number of cores available on the node. Somewhat confusingly, in Slurm, cpus = cores * sockets (thus, a two-processor, 6-cores machine would have 2 sockets, 6 cores and 12 cpus).

Number of cores (=cpus in Slurm), disk space and RAM are more tricky to get, as it might be different on different nodes. The following returns an easy-to-parse list:

sinfo -N -O "nodehost,disk,memory,cpus" --noheader

If all nodes that are the same, we can get the info from the first row of sinfo:

Number of cores (=Slurm cpus) per node:

sinfo -N -O "cpus" --noheader | head -1

RAM per node:

sinfo -N -O "memory" --noheader | head -1

disk space per node:

sinfo -N -O "disk" --noheader | head -1