I have a dataframe like as shown below
obs = pd.DataFrame({'person_id' :[1,2,3],'obs_date':['12/31/2007','11/25/2009',np.nan],
'hero_id':[2,4,np.nan],'date':['12/31/2017',np.nan,'10/06/2015'],
'heroine_id':[1,np.nan,5],'datetime':['12/31/2027','11/25/2029',np.nan],
'bud_source_value':[1250000,250000,np.nan],
'prod__source_value':[10000,20000,np.nan]})
The objective is to fill 3 default values based on column names.
cols ending with id - fillna with 0
cols ending with value - fillna with np.nan (leave it as is)
cols containing date[pandas datetime format] - fillna with 12/31/2000
When I tried the below
col = obs.columns.str
c1 = col.endswith('id')
c2 = col.contains('value')
c3 = col.endswith('date')
c4 = col.endswith('datetime')
filled_values = np.select([c1,c2,c3,c4], [obs.fillna(0), obs.fillna(np.nan), obs.fillna("1900-01-01"), obs.fillna("1900-01-01 00:00:00")])
obs= pd.DataFrame(filled_values,columns=cols)
But I get an error message like as below
typeerror u10 cannot be converted to an IntegerDtype pandas
Additionally, in one of my functions above, I use a piece of code like below
testdf['hero_id'] = testdf['hero_id'].astype('float').astype('Int64')
Not sure whether the above line is causing the error. I am not sure how to reproduce/debug this error. Can help me avoid this error please
Idea is create dictionary by columns names with replaced values and pass to DataFrame.fillna
:
c1 = dict.fromkeys(obs.filter(regex='id$').columns, 0)
c2 = dict.fromkeys(obs.filter(like='value').columns, np.nan)
c3 = dict.fromkeys(obs.filter(regex='date$').columns, "1900-01-01")
c4 = dict.fromkeys(obs.filter(regex='datetime$').columns, "1900-01-01 00:00:00")
d = {**c1, **c2, **c3, **c4}
print (d)
{'person_id': 0, 'hero_id': 0, 'heroine_id': 0,
'bud_source_value': nan, 'prod__source_value': nan,
'obs_date': '1900-01-01', 'date': '1900-01-01',
'datetime': '1900-01-01 00:00:00'}
obs = obs.fillna(d)
print (obs)
person_id obs_date hero_id date heroine_id \
0 1 12/31/2007 2.0 12/31/2017 1.0
1 2 11/25/2009 4.0 1900-01-01 0.0
2 3 1900-01-01 0.0 10/06/2015 5.0
datetime bud_source_value prod__source_value
0 12/31/2027 1250000.0 10000.0
1 11/25/2029 250000.0 20000.0
2 1900-01-01 00:00:00 NaN NaN