Search code examples
pythonpandasdataframefillna

Filling Missing Values Based on String Condition


I'm trying to write a function to impute some null values from a Numeric column based on string conditions from a Text column.

My attempt example:

def fill_nulls(string, val):
if df['TextColumn'].str.contains(string) == True:
    df['NumericColumn'] = df['NumericColumn'].fillna(value=val)

The 'string' and 'val' parameters are manually entered. I tried applying function to my numeric column but it gives me this error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I tried to find examples that I could tweak for my situation, but they all involved using 'groupby' to get the average numeric values relating to the discrete string values that had only a handful of unique values. Basically, only exact wording could be imputed, whereas I'm trying to generalize my string filtering by using partial strings and imputing the null values in the numeric column based on the resulting rows of the text columns.


Solution

  • Use Series.str.contains with DataFrame.loc:

    m = df['TextColumn'].str.contains(string)
    df.loc[m, 'NumericColumn'] = df.loc[m, 'NumericColumn'].fillna(value=val)
    

    Or chain conditions by & for bitwise AND for test missing values by Series.isna and assign value in DataFrame.loc:

    m1 = df['TextColumn'].str.contains(string)
    m2 = df['NumericColumn'].isna()
    df.loc[m1 & m2, 'NumericColumn'] = val