Search code examples
apache-sparkspark-graphx

Apache Spark : Reading file in Standalone cluster mode


I am currently using a graph that i load from a file when i run my Graphx application locally.

I'd like to run the application in cluster standalone mode.

Do I have to make changes like place the file in each cluster node? Can I leave my application unchanged and just keep the file in the driver?

Thank you.


Solution

  • In order to allow the executors on the node to access an input file, the file should be access by the nodes.

    The preferred way is to read the file from a location which support multi nodes, e.g. HDFS, cassandra

    It is possible that placing a copy of the file on each node might work as well, but it isn't the recommended way.