Search code examples
apache-sparkpysparkapache-spark-sqldatabricksazure-databricks

How to convert sql output to Dataframe?


I have a Dataframe, from which a create a temporary view in order to run sql queries. After a couple of sql queries, I'd like to convert the output of sql query to a new Dataframe. The reason I want data back in Dataframe is so that I can save it to blob storage.

So, the question is: what is the proper way to convert sql query output to Dataframe?

Here's the code I have so far:

%scala
//read data from Azure blob
...
var df = spark.read.parquet(some_path)

// create temp view
df.createOrReplaceTempView("data_sample")

%sql
//have some sqlqueries, the one below is just an example
SELECT
   date,
   count(*) as cnt
FROM
   data_sample
GROUP BY
   date

//Now I want to have a dataframe  that has the above sql output. How to do that?
Preferably the code would be in python or scala.



Solution

  • Scala:

    var df = spark.sql(s"""
    SELECT
       date,
       count(*) as cnt
    FROM
       data_sample
    GROUP BY
       date
    """)
    

    PySpark:

    df = spark.sql(f'''
    SELECT
       date,
       count(*) as cnt
    FROM
       data_sample
    GROUP BY
       date
    ''')