I have more than 15 csv files. The csv file 1 to file 4
when loaded as a dataframe look like as shown below
df = pd.DataFrame({'person_id': [101,101,101,101],
'Date':['06/08/2014 08:00:00 AM','06/01/2014 05:00:00 AM','06/03/2014 08:00:00 AM','06/09/2014 05:00:00 AM'],
'Test':['A1','B2','B5','B6']})
The csv file 5 to file 15
look like as shown below
df1 = pd.DataFrame({'person_id': [201,201,201,201],
'Date':[np.nan,np.nan,'06/03/2014 08:00:00 AM','06/09/2014 05:00:00 AM'],
'Test':['A1','B2','B5','B6'],
'date_backup':['03/01/2014 08:00:00 AM','10/08/2014 08:00:00 AM','09/09/2014 08:00:00 AM',np.nan]})
As you can see from sample dataframe df1
(of file 5 to file 15), that I have two date
columns
Now from file 5 to file 15
, I would like to fillna in those date_backup
or Date
by copying value from the other.
For example, I would like to do the below
df1.Date.fillna(df.date_backup, inplace=True)
df1.date_backup.fillna(df.Date, inplace=True)
Now, am not sure how can I do the above fillna()
operation in the below code
I tried the below but not sure whether it is efficient and elegant
pat_dir = ['Path1\path2\La*.csv','Path3\Path4\20*.csv']
files_grabbed = []
for files in pat_dir:
files_grabbed.extend(glob.glob(files))
d = {'date_backup':'Date'}
cols = ['Date','Test','value','person_id']
dfs = [pd.read_csv(f, sep=",",low_memory=False).rename(columns=d).reindex(columns=cols)
for f in files_grabbed] #thanks to Jezrael for this code
In the code above, you can see that I am renaming the date_backup
column.
How can I fillna() efficiently and elegantly
I expect my output to be concatenated/appended dataframe of 15 csv files like as shown below (with one Date column). My only question is how can I fillna() before appending dataframes, so I will have only Date
column
If need fillna columns in one DataFrame is possible use if
statement for fillna
if exist column date_backup
:
dfs= []
for f in files_grabbed:
df1 = pd.read_csv(f, sep=",",low_memory=False)
if 'date_backup' in df1.columns:
df1.Date = df1.Date.fillna(df1.date_backup)
#not sure if necessary rename and reindex
df1 = df1.rename(columns=d).reindex(columns=cols)
dfs.append(df1)
Another idea is add date_backup
for each DataFrame
and fillna
after concat
:
cols = ['Date','Test','value','person_id', 'date_backup']
dfs = [pd.read_csv(f, sep=",",low_memory=False).rename(columns=d).reindex(columns=cols)
for f in files_grabbed]
df = pd.concat(dfs)
df.Date = df.Date.fillna(df.pop('date_backup'))