Search code examples
apache-sparkpysparkhive

Passing variables to hive query in pyspark sql


I am trying to execute a query against hive table using spark sql.

The below works fine

spark=SparkSession.builder.master("local[1]".enableHiveSupport().appName("test").getOrCreate()
df=spark.sql("select * from table_name where date='2021-05-16' and name='xxxx'")

But I want to pass date and name as a variable and not hardcode it into SQL.

Is there a way to pass date=current_date instead of hardcoding the value

I am trying to pass current date as date to query using time.strftime and name I have to pass it from another variable name='xxxx'


Solution

  • do you can to pass the variables from outside of the py file?

    if it is , you can try this

    import sys
    day = sys.argv[1]
    df=spark.sql("select * from table_name where date='%s'" % day)
    
    spark-submit --master yarn test.py 2021-09-17