Search code examples
python-3.xpandasfillna

pandas.Dataframe() mixed data types and strange .fillna() behaviour


I have a dataframe which has two dtypes: Object (was expecting string) and Datetime (expected datetime). I don't understand this behavior and why it affects my fillna().

enter image description here

Calling .fillna() with inplace=True wipes the data denoted as int64 despite being changed with .astype(str)

enter image description here

Calling .fillna() without it does nothing.

enter image description here

I know pandas / numpy dtypes are different to the python native, but is it correct behavior or am I getting something terribly wrong?

sample:

import random
import numpy
sample = pd.DataFrame({'A': [random.choice(['aabb',np.nan,'bbcc','ccdd']) for x in range(15)],
                       'B': [random.choice(['2019-11-30','2020-06-30','2018-12-31','2019-03-31']) for x in range(15)]})
sample.loc[:, 'B'] = pd.to_datetime(sample['B'])

enter image description here

for col in sample.select_dtypes(include='object').columns.tolist():
    sample.loc[:, col].astype(str).apply(lambda x: str(x).strip().lower()).fillna('NULL')

for col in sample.columns:
    print(sample[col].value_counts().head(15))
    print('\n')

Here neither 'NULL' nor 'nan' appear. Added .replace('nan','NULL'), but still nothing. Can you give me a clue what to look for, please? Many thanks.

enter image description here


Solution

  • Problem here is converting missing values to strings, so fillna cannot working. solution is use pandas function Series.str.strip and Series.str.lower working with missing values very nice:

    for col in sample.select_dtypes(include='object').columns:
        sample[col] = sample[col].str.strip().str.lower().fillna('NULL')