Search code examples
apache-spark

What is the use of --driver-class-path in the spark command?


As per spark docs,

To get started you will need to include the JDBC driver for you particular database on the spark classpath. For example, to connect to postgres from the Spark Shell you would run the following command:

bin/spark-shell --driver-class-path postgresql-9.4.1207.jar --jars postgresql-9.4.1207.jar

Job is working fine without --driver-class-path. Then, what is the use of --driver-class-path in the spark command?


Solution

  • --driver-class-path or spark.driver.extraClassPath can be used for to modify class path only for the Spark driver. This is useful for libraries which are not required by the executors (for example any code that is used only locally).

    Compared to that, --jars or spark.jars will not only add jars to both driver and executor classpath, but also distribute archives over the cluster. If particular jar is used only by the driver this is unnecessary overhead.