Search code examples
pysparkdatabrickssparkr

Databricks: how to convert Spark dataframe under %r to dataframe under %python


I found some tips about converting a pyspark dataframe to R, but I need to perform the opposite task: convert a R dataframe to pyspark

Anyone knows how to do it?


Solution

  • You can use the same approach as for other languages - use createOrReplaceTempView function to register your dataframe, and then use spark.sql from another language to access its content.

    For example. If R side looks as following:

    %r
    library(SparkR)
    id <- c(rep(1, 3), rep(2, 3), 3)
    desc <- c('New', 'New', 'Good', 'New', 'Good', 'Good', 'New')
    df <- data.frame(id, desc)
    df <- createDataFrame(df)
    createOrReplaceTempView(df, "test_df")
    head(df)
    
      id desc
    1  1  New
    2  1  New
    3  1 Good
    4  2  New
    5  2 Good
    6  2 Good
    

    then you can access these data from Python:

    df = spark.sql("select * from test_df")
    df.show()
    
    +---+----+
    | id|desc|
    +---+----+
    |1.0| New|
    |1.0| New|
    |1.0|Good|
    |2.0| New|
    |2.0|Good|
    |2.0|Good|
    |3.0| New|
    +---+----+