Search code examples
apache-sparkapache-spark-sqldatastaxdatabricks

How to build a dynamic query string to execute on a spark-sql 2.3.1 dataframe?


I have a requirement to get the where condition passed by user as program arguments. Based on the where condition i need to query the source data base.

I am using spark-sql.2.3.1 How to construct and pass/executive dynamically build query?

Sample query:

select ProductId, COUNT(*) AS ProductSaleCount
 from productsale
 where to_date(Date) >= "2015-12-17"
 and to_date(Date) <= "2015-12-31"
 group by ProductId

Solution

  • All you have to do in your scenario is create a query string which would go something like:

    val query = "select ProductId, COUNT(*) AS ProductSaleCount from productsale where to_date(Date) >= "+ fromDate +" and to_date(Date) <= " + toDate + " group by ProductId"
    

    the fromDate and toDate, you would get from your arguments, perhaps.

    To use this, however is a different issue and it depends on your database

    For hive you can simply register your spark session with enableHiveSupport

    val spark = SparkSession.builder().appName("My App").enableHiveSupport().config("spark.sql.warehouse.dir", warehouseLocation).getOrCreate()
    
    val data = spark.sqlContext.sql(query)
    

    If the data is in a dataframe and you want to query that, you would have to create a view and then run your query on that

    finalDataFrame.createOrReplaceTempView("productsale")
    
    val data = spark.sqlContext.sql(query)
    

    Hope this helps