Search code examples
apache-sparkparsinghivehiveql

Fixing good Hive SQL query that throws parsing exception in Spark SQL


Having the following query executed in Hive works fine:

ALTER TABLE schema.the_table 
DROP IF EXISTS PARTITION (PARTITION_DATE<'2020-03-22')

Partition date is type String, but that's OK. The goal is to drop partitions older than particular date.

On the other hand the same query but used in Spark(2.1):

val spark = SparkSession.builder.enableHiveSupport.getOrCreate
spark.sql("""
ALTER TABLE schema.the_table 
DROP IF EXISTS PARTITION (PARTITION_DATE<'2020-03-22')
""")

Throws parsing exception

User class threw exception: org.apache.spark.sql.catalyst.parser.ParseException: mismatched input '<' expecting {')', ','}(line 1, pos 82)

== SQL == ALTER TABLE schema.the_table DROP IF EXISTS PARTITION (PARTITION_DATE<'2020-03-22')

The question is where I can find the characters that must be escaped in Spark SQL and is there any way to write the same query in Spark SQL as in Hive?


Solution

  • Hive partitions cannot be dropped using spark.sql as of Spark version 3.0 due to a known bug and it is currently being targeted to be fixed in version 3.2.0. Check the JIRA Spark-JIRA.