Search code examples
apache-spark-sql

Run a sql query on a PySpark DataFrame


I am using Databricks and I already have loaded some DataTables.

However, I have a complex SQL query that I want to operate on these data tables, and I wonder if i could avoid translating it in pyspark.

Is that possible?

To give an example:

In SQL:

with table2 as (
   select column1, column1
   from database.table1
   where
         start_date <= DATE '2019-03-01' and
         end_date >= DATE '2019-03-31'            )

In pyspark I would already have table1 loaded but the following does not work because it can not find table1.

query = "(
    select column1, column1
           from table1
               where
         start_date <= DATE '2019-03-01' and
         end_date >= DATE '2019-03-31'            )"
table2 = spark.sql(query)

Thanks


Solution

  • Try giving databasename.tablename instead of tablename in query.

    query = "(
        select column1, column1
               from *database_name.table_name* 
                   where
             start_date <= DATE '2019-03-01' and
             end_date >= DATE '2019-03-31' )" 
    

    If you are using pyspark then it must be

    spark.sql(query)