Search code examples
sqltimeapache-sparkibm-cloud

How to measure the execution time of a query on Spark


I need to measure the execution time of query on Apache spark (Bluemix). What I tried:

import time

startTimeQuery = time.clock()
df = sqlContext.sql(query)
df.show()
endTimeQuery = time.clock()
runTimeQuery = endTimeQuery - startTimeQuery

Is it a good way? The time that I get looks too small relative to when I see the table.


Solution

  • Update: No, using time package is not the best way to measure execution time of Spark jobs. The most convenient and exact way I know of is to use the Spark History Server.

    On Bluemix, in your notebooks go to the "Paelette" on the right side. Choose the "Evironment" Panel and you will see a link to the Spark History Server, where you can investigate the performed Spark jobs including computation times.