python apache-spark pyspark azure-synapse azure-notebooks

PySpark - Synapse Notebook don't throw error if dataframe finds no files

I have a Synapse notebook in which I am creating a dataframe based on parquet data. I am also filtering the files, to ensure I only pickup the new files.

ReadDF = spark.read.load(readPath,format="parquet", modifiedBefore=PLP___EndDate, modifiedAfter=PLP___StartDate)

If I set the Startdate variable to something in the future, which will ensure that no files are found I am getting the following error:

AnalysisException: Unable to infer schema for Parquet. It must be specified manually.

Is there a way to ignore this error? Exactly, like in ADF DataFlow the option for "allow no files found".

Solution

The above occurs when there is no parquet file to read in the specified path.

I have given an empty directory ok and I got same error.

spark.read.load('abfss://data2@rakeshstorage2.dfs.core.windows.net/ok',format="parquet").show()

enter image description here

You are giving a future date that means its same as reading an empty directory with no parquet files in this case.

Is there a way to ignore this error? Exactly, like in ADF DataFlow the option for "allow no files found".

AFAIK, spark didn't have that feature currently. One possible way to avoid this error is to use exception handling. Put your code in try block and handle the error like below.

readpath='abfss://data2@rakeshstorage2.dfs.core.windows.net/myparquet'
modifiedBefore='2010-06-01T13:00:00'
modifiedAfter='2024-06-01T13:00:00'
try:
    df2=spark.read.load(readpath,format="parquet", modifiedBefore=modifiedBefore,modifiedAfter=modifiedAfter)
except Exception as e:
    print("No files found with the above dates")

enter image description here