Search code examples
apache-sparkpyspark

Where can I find detailed information on all steps for Spark Physical plan?


So I am completely new to Spark and I am trying to use Spark WebUI to read plans, dags and stuff like that. And honestly I found lack of informations on spark physical, logical e.t.c plans very disturbing. For example I was looking for 2 hours straight tryna find information on such thing as BatchScan step in Physical Plan. So my question is: is there some sort of enciclopedia for that sort of things with all parameters explained? Is there may be at least some sort of cheatsheet for some basic spark query plan steps?

Thank you beforehand!


Solution

  • In my opinion the best single source of knowledge fro Spark SQL details is this book by Jacek Laskowski: The Internals of Spark SQL

    Here you can find high-level description of component but also detailed descriptions for each step (you can try to use search to find docu for BatchScan)

    You can also always take a look at Spark source code, here for example you can see code for BatchScanExec