Search code examples
apache-sparkmulti-tenant

Is it possible to know the resources used by a specific Spark job?


I'm drawing on ideas of using a multi tenant Spark cluster. The cluster execute jobs on demand for a specific tenant.

Is it possible to "know" the specific resources used by a specific job (for payment reasons)? E.g. if a job requires that several nodes in kubernetes is automatically allocated is it then possible to track which Spark jobs (and tenant at the end) that initiated these resource allocations? Or, jobs are always evenly spread out on allocated resources?

Tried to find information at the Apache Spark site and else where on the internet without success.


Solution

  • See https://spark.apache.org/docs/latest/monitoring.html

    You can save data from Spark History Server as json and then write your own resource calc stuff.

    It is Spark App you mean.