I'm trying to count the length of object variables in a dataframe with Python. A lot of my variables are string with missing values and unfortunately when I try to count the length of missing values it shows as 3 (as it counts "Nan" as a 3 character value).
Here's the code that I'm using:
df_string_mean_with_na = pd.DataFrame(df_string.applymap(len).astype(int).mean().to_dict(), index=[df_string.index.values[0]])
where df_string is my starting dataframe and I'm trying to calculate the average length of values for each columns. I would like to count the length of missing values for object variables as 0, is there a way?
I think you need DataFrame.fillna
for replace missing values to empty strings before counting length
:
print (Table1)
A B C
0 hello hi NaN
1 good hi so
2 home hello no
Test missing values:
print (Table1.isna())
A B C
0 False False True
1 False False False
2 False False False
df = Table1.fillna('').applymap(len).mean().to_frame().T
print (df)
A B C
0 4.333333 3.0 2.333333
Detail:
print (Table1.fillna('').applymap(len))
A B C
0 5 2 0
1 4 2 2
2 4 5 2
If missing values are strings
use DataFrame.replace
:
print (Table1.isna())
A B C
0 False False False
1 False False False
2 False False False
df = Table1.replace('NaN', '').applymap(len).mean().to_frame().T
print (df)
A B C
0 4.333333 3.0 2.333333