Search code examples
apache-sparkpysparkhive

Pyspark sql to create hive partitioned table


I am trying to create a hive paritioned table from pyspark dataframe using spark sql. Below is the command I am executing, but getting an error. Error message below.

df.createOrReplaceTempView(df_view)
spark.sql("create table if not exists tablename PARTITION (date) AS select * from df_view")

Error: pyspark.sql.utils.ParseException:u"\nmismatched input 'PARTITION' expecting <EOF>

When I try to run without PARTITION (date) in the above line it works fine. However I am unable to create with partition.

How to create table with partition and insert date from.pyspark dataframe to hive.


Solution

  • To address this I created the table first

    spark.sql("create table if not exists table_name (name STRING,age INT) partitioned by (date_column STRING)")

    Then set dynamic partition to nonstrict using below. spark.sql("SET hive.exec.dynamic.partition = true") spark.sql("SET hive.exec.dynamic.partition.mode = nonstrict") spark.sql("insert into table table_name PARTITION (date_column) select *,'%s from df_view" % current_date))

    Where current date is a variable with today's date.