I have a dataframe which has two dtypes: Object (was expecting string) and Datetime (expected datetime). I don't understand this behavior and why it affects my fillna().
Calling .fillna() with inplace=True wipes the data denoted as int64 despite being changed with .astype(str)
Calling .fillna() without it does nothing.
I know pandas / numpy dtypes are different to the python native, but is it correct behavior or am I getting something terribly wrong?
sample:
import random
import numpy
sample = pd.DataFrame({'A': [random.choice(['aabb',np.nan,'bbcc','ccdd']) for x in range(15)],
'B': [random.choice(['2019-11-30','2020-06-30','2018-12-31','2019-03-31']) for x in range(15)]})
sample.loc[:, 'B'] = pd.to_datetime(sample['B'])
for col in sample.select_dtypes(include='object').columns.tolist():
sample.loc[:, col].astype(str).apply(lambda x: str(x).strip().lower()).fillna('NULL')
for col in sample.columns:
print(sample[col].value_counts().head(15))
print('\n')
Here neither 'NULL' nor 'nan' appear. Added .replace('nan','NULL'), but still nothing. Can you give me a clue what to look for, please? Many thanks.
Problem here is converting missing values to string
s, so fillna
cannot working. solution is use pandas function Series.str.strip
and Series.str.lower
working with missing values very nice:
for col in sample.select_dtypes(include='object').columns:
sample[col] = sample[col].str.strip().str.lower().fillna('NULL')