Search code examples
rapache-sparkapache-spark-sqlsparklyr

Return logical plan using sparklyr


We are trying to get the logical plan (not to be confused with the physical plan) that Spark generates for a given query. According to the Spark docs here you should be able to retrieve this using the scala command:

df.explain(true)

or in sparklyr with the example code:

spark_version <- "2.4.3"
sc <- spark_connect(master = "local", version = spark_version)
iris_sdf <- copy_to(sc, iris)

iris_sdf %>% 
  spark_dataframe %>% 
  invoke("explain", T)

This command runs, but simply returns NULL in RStudio. My guess is that sparklyr does not retrieve content that is printed to the console. Is there a way around this or another way to retrieve the logical plan using sparklyr? The physical plan is easy to get using dplyr::explain([your_sdf]), but does not return the logical plan that was used to create it.


Solution

  • Looks like you can get this via:

    iris_sdf %>% 
      spark_dataframe %>% 
      invoke("queryExecution") %>%
      invoke("toString") %>%
      cat()