Search code examples
pythonpandasnumpycountstring-length

How to count length of missing values for string variables as zero?


I'm trying to count the length of object variables in a dataframe with Python. A lot of my variables are string with missing values and unfortunately when I try to count the length of missing values it shows as 3 (as it counts "Nan" as a 3 character value).

Here's the code that I'm using:

df_string_mean_with_na = pd.DataFrame(df_string.applymap(len).astype(int).mean().to_dict(), index=[df_string.index.values[0]])

where df_string is my starting dataframe and I'm trying to calculate the average length of values for each columns. I would like to count the length of missing values for object variables as 0, is there a way?


Solution

  • I think you need DataFrame.fillna for replace missing values to empty strings before counting length:

    print (Table1)
           A      B    C
    0  hello     hi  NaN
    1   good     hi   so
    2   home  hello   no
    

    Test missing values:

    print (Table1.isna())
           A      B      C
    0  False  False   True
    1  False  False  False
    2  False  False  False
    
    df = Table1.fillna('').applymap(len).mean().to_frame().T
    print (df)
              A    B         C
    0  4.333333  3.0  2.333333
    

    Detail:

    print (Table1.fillna('').applymap(len))
       A  B  C
    0  5  2  0
    1  4  2  2
    2  4  5  2
    

    If missing values are strings use DataFrame.replace:

    print (Table1.isna())
           A      B      C
    0  False  False  False
    1  False  False  False
    2  False  False  False
    
    df = Table1.replace('NaN', '').applymap(len).mean().to_frame().T
    print (df)
              A    B         C
    0  4.333333  3.0  2.333333