Search code examples
pythonpyspark

Pyspark - How to handle error in for list


I've written something to read the location of some lake files dynamically provided by a list called partition_paths:

dfs = [spark.read.parquet(f'{l}') for l in partition_paths]

I will then combine all these dfs into one in the next line:

df = reduce(DataFrame.unionAll, dfs)

But it maybe possible that the partition_paths are either built up incorrectly, or the location in the lake simply doesn't exist, so I need to error handle the first line of code. How can I do that so it won't just stop and would continue getting all the dfs?


Solution

  • I don't know what is spark or what you are trying to do but I think you don't have to use list comprehension. You should use a usual for loop instead.

    dfs = []
    for l in partition_paths:
        try:
            variable = spark.read.parquet(f'{l}')
            dfs.append(variable)
        except:
            print("error occured")
    
    df = reduce(DataFrame.unionAll, dfs)