Search code examples
pythonapache-sparkpysparkazure-synapseazure-notebooks

PySpark - Synapse Notebook don't throw error if dataframe finds no files


I have a Synapse notebook in which I am creating a dataframe based on parquet data. I am also filtering the files, to ensure I only pickup the new files.

ReadDF = spark.read.load(readPath,format="parquet", modifiedBefore=PLP___EndDate, modifiedAfter=PLP___StartDate)

If I set the Startdate variable to something in the future, which will ensure that no files are found I am getting the following error:

AnalysisException: Unable to infer schema for Parquet. It must be specified manually.

Is there a way to ignore this error? Exactly, like in ADF DataFlow the option for "allow no files found".


Solution

  • The above occurs when there is no parquet file to read in the specified path.

    I have given an empty directory ok and I got same error.

    spark.read.load('abfss://[email protected]/ok',format="parquet").show()
    

    enter image description here

    You are giving a future date that means its same as reading an empty directory with no parquet files in this case.

    Is there a way to ignore this error? Exactly, like in ADF DataFlow the option for "allow no files found".

    AFAIK, spark didn't have that feature currently. One possible way to avoid this error is to use exception handling. Put your code in try block and handle the error like below.

    readpath='abfss://[email protected]/myparquet'
    modifiedBefore='2010-06-01T13:00:00'
    modifiedAfter='2024-06-01T13:00:00'
    try:
        df2=spark.read.load(readpath,format="parquet", modifiedBefore=modifiedBefore,modifiedAfter=modifiedAfter)
    except Exception as e:
        print("No files found with the above dates")
    

    enter image description here