I am trying to convert a SQL Code into Pyspark SQL. While selecting the columns from a table , the Select Statement has something as below :
Select a.`(column1|column2|column3)?+.+`,trim(column c) from Table a;
I would like to understand what
a.`(column1|column2|column3)?+.+`
expression resolves to and what it actually implies? How to address this while converting the sql into pyspark?
That is a way of selecting certain column names using regexps. That regex matches (and excludes) the columns column1
, column2
or column3
.
It is the Spark's equivalent of the Hive's Quoted Identifiers. See also Spark's documentation.
Be aware that, for enabling this behavior, it is first necessary to run the following command:
spark.sql("SET spark.sql.parser.quotedRegexColumnNames=true").show(false)