Search code examples
pythonapache-sparkcluster-computingluigi

Why Spark Driver read local file


I use Spark Cluster Standalone.

The master and single slave are in the same server (server B).

I use Luigi (on Server A) to submit my application and deploy (client mode).

My application read local files on Server B. However, the application tries to read the files also on the server A. Why ?

sc.textFile('/path/to/the/file/*')

Solution

  • In client mode, the driver is launched in the same process as the client that submits the application.

    In cluster mode, however, the driver is launched from one of the Worker processes inside the cluster.

    You should use cluster mode.