Search code examples
pythonpandasruntime-errorseries

For Loop in Python Error: The truth value of a Series is ambiguous


why this for loop does not work...?

I want to get a new column with Delivery Year, it consists of these columns, however, there are a lot of Nans so the logic is that the for loop goes through columns and returns the first non-Na value. The best-case scenario is Delivery Date, when this is not there then Build Year if even this is not there then at least In-Service Date when the machine was set into work.

df = pd.DataFrame({'Platform ID' : [1,2,3,4], "Delivery Date" : [str(2009), float("nan"), float("nan"), float("nan")],
                                              "Build Year" : [float("nan"),str(2009),float("nan"), float("nan")], 
                                              "In Service Date" : [float("nan"),str("14-11-2010"), str("14-11-2009"), float("nan")]})
df.dtypes
df

def delivery_year(delivery_year, build_year, service_year):
    out = []
    for i in range(0,len(delivery_year)):
        if delivery_year.notna():
            out[i].append(delivery_year)
        if (delivery_year[i].isna() and build_year[i].notna()):
            out[i].append(build_year)
        elif build_year[i].isna():
            out[i].append(service_year.str.strip().str[-4:])
        else:
            out[i].append(float("nan"))
    return out

df["Delivery Year"] = delivery_year(df["Delivery Date"], df["Build Year"], df["In Service Date"])

When I run this function I get this error and I do not know why...

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

The expected output (column Delivery Year): enter image description here


Solution

  • Update 3

    I rewrote your function in the same manner of your, so without change the logic and the type of your columns. I let you compare the two versions:

    def delivery_year(delivery_date, build_year, service_year):
        out = []
        for i in range(len(delivery_date)):
            if pd.notna(delivery_date[i]):
                out.append(delivery_date[i])
            elif pd.isna(delivery_date[i]) and pd.notna(build_year[i]):
                out.append(build_year[i])
            elif pd.isna(build_year[i]) and pd.notna(service_year[i]):
                out.append(service_year[i].strip()[-4:])
            else:
                out.append(float("nan"))
        return out
    
    df["Delivery Year"] = delivery_year(df["Delivery Date"],
                                        df["Build Year"],
                                        df["In Service Date"])
    

    Notes:

    1. I changed the name of your first parameter because delivery_year is also the name of your function, so it can be confusing.

    2. I also replaced the .isna() and .notna() methods by their equivalent functions: pd.isna(...) and pd.notna(...).

    3. The second if became elif

    Update 2

    Use combine_first to replace your function. combine_first updates first series ('Delivery Date') with the second series where values are NaN. You can chain them to fill your 'Delivery Year'.

    df['Delivery Year'] = df['Delivery Date'] \
                              .combine_first(df['Build Year']) \
                              .combine_first(df['In Service Date'].str[-4:])
    

    Output:

    >>> df
       Platform ID Delivery Date Build Year In Service Date Delivery Year
    0            1          2009        NaN             NaN          2009
    1            2           NaN       2009      14-11-2010          2009
    2            3           NaN        NaN      14-11-2009          2009
    3            4           NaN        NaN             NaN           NaN
    

    Update

    You forgot the [i]:

    if delivery_year[i].notna():
    

    The truth value of a Series is ambiguous:

    >>> delivery_year.notna()
    0     True  # <- 2009
    1    False  # <- NaN
    2    False
    3    False
    Name: Delivery Date, dtype: bool
    

    Pandas should consider the series is True (2009) or False (NaN)?

    You have to aggregate the result with .any() or .all()

    >>> delivery_year.notna().any()
    True  # because there is at least one non nan-value.
    
    >>> delivery_year.notna().all()
    False  # because all values are not nan.