Issue:
I'm trying to write to parquet file using spark.sql, however I encounter issues when having unions or subqueries. I know there's some syntax I can't seem to figure out.
Ex.
%python
df = spark.sql("SELECT
sha2(Code, 256) as COUNTRY_SK,
Code as COUNTRY_CODE,
Name as COUNTRY_NAME,
current_date() as EXTRACT_DATE
FROM raw.EXTR_COUNTRY)
UNION ALL
SELECT
-1 as COUNTRY_SK,
'Unknown' as COUNTRY_CODE,
'Unknown' as COUNTRY_NAME,
current_date() as EXTRACT_DATE")
df.write.parquet("dbfs:/mnt/devstorage/landing/companyx/country",
mode="overwrite")
WHEN doing a simple query I have no issues at all, such as:
%python
df = spark.sql("select * from raw.EXTR_COUNTRY")
df.write.parquet("dbfs:/mnt/devstorage/landing/companyx/country/",
mode="overwrite")
The quotes solved the issue, the sql-script itself wasn't the issue. So using tripple quotes (""" or ''') solved the issue.
%python
df = spark.sql("""SELECT
sha2(Code, 256) as COUNTRY_SK,
Code as COUNTRY_CODE,
Name as COUNTRY_NAME,
current_date() as EXTRACT_DATE
FROM raw.EXTR_COUNTRY)
UNION ALL
SELECT
-1 as COUNTRY_SK,
'Unknown' as COUNTRY_CODE,
'Unknown' as COUNTRY_NAME,
current_date() as EXTRACT_DATE""")
df.write.parquet("dbfs:/mnt/devstorage/landing/companyx/country",
mode="overwrite")