Search code examples
pysparkhiveapache-spark-sql

Using loop to create spark SQL queries


I am trying to create some spark SQL queries for different tables which i have collected as a list. I want to create SQL queries for all the tables present in the hive database.The hive context has been initialized Following is my approach.

tables= spark.sql("select tables in survey_db")

# registering dataframe as temp view with 2 columns - tableName and db name
tables.createOrReplaceTempView("table_list") 

# collecting my table names in a list
table_array= spark.sql("select collect_list(tableName) from table_list").collect()[0][0] 

# array values(list)
table_array= [u'survey',u'market',u'customer'] 

I want to create spark SQL queries for the table names stored in table_array. for example:

for i in table_array:
   spark.sql("select * from survey_db.'i'")

I cant use shell scripting as i have to write a pyspark script for this. Please advice if spark.sql queries can be created using loop/map . Thanks everyone.


Solution

  • You can achieve the same as follows:

    sql_list = [f"select * from survey_db.{table}" for table in table_array]
    for sql in sql_list:
        df = spark.sql(sql)
        df.show()