Search code examples
apache-sparkpysparkapache-zeppelin

How do I print out a spark.sql object?


I have a spark.sql object that includes a couple of variables.

import com.github.nscala_time.time.Imports.LocalDate

val first_date = new LocalDate(2020, 4, 1)
val second_date = new LocalDate(2020, 4, 7)

val mydf = spark.sql(s"""
        select *
        from tempView
        where timestamp between '{0}' and '{1}'
""".format(start_date.toString, end_date.toString))

I want to print out mydf because I ran mydf.count and got 0 as the outcome.

I ran mydf and got back mydf: org.apache.spark.sql.DataFrame = [column: type]

I also tried println(mydf) and it didn't return the query.

There is this related question, but it does not have the answer.

How can I print out the query?


Solution

  • Easiest way would be store your query into a variable then print out the variable to get the query.

    • Use variable in spark.sql

    Example:

    In Spark-scala:

    val start_date="2020-01-01"
    val end_date="2020-02-02"
    val query=s"""select * from tempView where timestamp between'${start_date}' and '${end_date}'"""
    print (query)
    //select * from tempView where timestamp between'2020-01-01' and '2020-02-02'
    
    spark.sql(query)
    

    In Pyspark:

    start_date="2020-01-01"
    end_date="2020-02-02"
    query="""select * from tempView where timestamp between'{0}' and '{1}'""".format(start_date,end_date)
    
    print(query)
    #select * from tempView where timestamp between'2020-01-01' and '2020-02-02'
    
    #use same query in spark.sql
    spark.sql(query)