I have 10 csv files. I want to copy first row from all csv files and save as new csv file, then copy second row from all csv files and save as second csv file and etc. My code in the following done only for first row and other rwos display NaN
. Where is my error?
Code
import pandas as pd
import datetime
import glob
path = r'/Jupyter_Works/new_csv'
all_files = glob.glob(path + "/*.csv")
date_time = datetime.datetime(2018, 1, 1)
index = pd.date_range(start='1/1/2018', periods= 8760, freq='H')
columns = ['Lat','Lon','Alt','Temperature','Relative Humidity','Wind speed','Wind direction','Short-wave irradiation']
dfcsv = pd.DataFrame(index=index, columns=columns)
for filename in all_files:
df = pd.read_csv(filename, index_col='time', header=0)
dfcsv.iloc[0] = df.iloc[0]
dfcsv
Result
Lat Lon Alt Temperature Relative Humidity Wind speed Wind direction Short-wave irradiation
2018-01-01 00:00:00 31.03 49.36 99 285.56 52.82 2.95 128.5 0
2018-01-01 01:00:00 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-01 02:00:00 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-01 03:00:00 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-01 04:00:00 NaN NaN NaN NaN NaN NaN NaN NaN
First create one big DataFrame
with list comprehension and concat
, loop by unique values for select by loc
and write to files by DataFrame.to_csv
. It working, because each DataFrame has unique index, so if select by unique values then select rows with same position in all files.
path = r'/home/nickan/Jupyter_Works/new_csv'
all_files = glob.glob(path + "/*.csv")
dfs = [pd.read_csv(fp, index_col='time', parse_dates=['time']) for fp in all_files]
df = pd.concat(dfs)
for x in df.index.unique():
#removed duplicated index by index=False
df.loc[x].to_csv(f'csv/file_{x.strftime("%Y-%m-%d_%H")}.csv', index=False)
EDIT:
Because memory problems is possible use alternative solution with loop by each row in dataFrames and write in append mode:
for i, fp in enumerate(all_files):
df = pd.read_csv(fp, index_col='time', parse_dates=['time'])
for x in df.index:
f = f'out/file_{x.strftime("%Y-%m-%d_%H")}.csv'
if i == 0:
df.loc[[x]].to_csv(f, index=False)
else:
df.loc[[x]].to_csv(f, index=False,header=None, mode='a')