Search code examples
apache-sparkstage

In Apache Spark , do Tasks in the same Stage work simultaneously or not?


do tasks in the same stage work simultaneously? if so, the line between partitions in a stage refers to what? example of a DAG


Solution

  • here is a good link for your reading. that explains DAG in detail and few other things that may be of interest. databricks blog on DAG

    I can try to explain. as each stage is created it has a set of tasks that are divided. when an action is encountered. Driver sends the task to executors. based on how your data is partitioned N number tasks are invoked on the data in your distributed cluster. so the arrows that you are seeing is execution plan. as in it cannot do map function prior to reading the file. each node that has some data will execute those tasks in order that is provided by the DAG.