Search code examples
apache-sparkapache-spark-mllibapache-spark-1.5

Spark job execution time


This might be a very simple question. But is there any simple way to measure the execution time of a spark job (submitted using spark-submit)?

It would help us in profiling the spark jobs based on the size of input data.

EDIT : I use http://[driver]:4040 to monitor my jobs, but this Web UI shuts down the moment my job finishes.


Solution

  • Every SparkContext launches its own instance of Web UI which is available at

    http://[master]:4040
    by default (the port can be changed using spark.ui.port ).

    It offers pages (tabs) with the following information:

    Jobs, Stages, Storage (with RDD size and memory use) Environment, Executors, SQL

    This information is available only until the application is running by default.

    Tip : You can use the web UI after the application is finished by enabling spark.eventLog.enabled.

    Sample web ui where you can see the time as 3.2hours: enter image description here