I have this question that I have not been able to find its answer anywhere.
I am using the following lines to load data within a PySpark application:
loadFile = self.tableName+".csv"
dfInput= self.sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load(loadFile)
My cluster configuration is as follows:
In Apache Spark Standalone, how is the process of loading partitions to RAM?
Is it none of these and I am missing something here? How can I witness this process by myself (monitoring tool, unix command, somewhere in Spark)?
Any comment or resource in which I can get deep into this would be very helpful. Thanks in advance.
The second scenario is correct:
each executor accesses to storage and loads to its own RAM? (Storage --> executor's RAM)