I am new to Spark and going through the Learning Spark Journal. Had queries regarding a concept for data fetching/reading.
If I am having an external data source (not partitioned) and I want to process that in Spark.
So the first partition, will scan the entire data source and store the data BETWEEN 1000 AND 2000
then the second partition will again scan the entire data source and store the date BETWEEN 2000 AND 3000?
Also will there be 10 separate spark sessions to handle them in parallel? If not then how will single session read them in parallel?
And each partitions which be stored in separate executors?
Tried searching over net but was not able to get a satisfactory explanation to my doubt.