Search code examples
dataframeapache-sparkcatalyst-optimizer

Dataframe API vs Spark.sql


Does writing the code in Dataframe API format rather than Spark.sql queries have any significance advantage ?

Would like to know whether Catalyst optimizer would be working on spark.sql queries also or not .


Solution

  • your dataframe transformations and spark sql querie will be translated to execution plan anyway and Catalyst will optimize it.

    The main advantage of dataframe api is that you can use dataframe optimize fonction, for example : cache() , in general you will have more control of the execution plan.

    I feel like it easier to test your code also, people tend to write 1 huge query ...