Search code examples
rparquetdatabrickssparkrsparklyr

What is difference between dataframe created using SparkR and dataframe created using Sparklyr?


I am reading a parquet file in Azure databricks: Using SparkR > read.parquet() Using Sparklyr > spark_read_parquet() Both the dataframes are different, Is there any way to convert SparkR dataframe into the sparklyr dataframe and vice-versa ?


Solution

  • sparklyr creates tbl_spark. This is essentially just a lazy query written in Spark SQL. SparkR creates a SparkDataFrame which is more of a collection of data that is organized using a plan.

    In the same way you can't use a tbl as a normal data.frame you can't use a tbl_spark the same way as a SparkDataFrame.

    The only way I can think of to convert one to the other would be to write it to your data lake/ data warehouse or read it into r first.