I'm trying to write a basic program to concatenate a bunch of csv files in a folder, into a single file. Hopefully, with the column header displaying only once. All the files are arranged in the same way regarding numbers of columns and headings. The issue is that the output file starts with a ',' and thus pushes the headings across 1 column. All the data appears fine.
The below is what I've tried using:
import pandas as pd
import glob
from tkinter.filedialog import askdirectory
path = askdirectory() + '/'
files = glob.glob("*.csv")
all_files = [path + f for f in files]
flist = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
flist.append(df)
df_out = pd.concat(flist, axis=0, ignore_index=False)
df_out.to_csv("output.csv")
Which give an output.csv that look like this:
,DATE,PM2.5,PM10,CO2,TEMPERATURE,HUMIDITY
2023-02-20 10:01:13,6.4,7.1,920,18.8,78.8,
2023-02-20 10:02:13,6.4,7.5,911,18.9,78.7,
2023-02-20 10:03:13,5.5,6.4,901,19.1,78.0,
2023-02-20 10:04:13,5.8,6.7,889,19.3,77.3,
I'm sure that I'm misunderstanding how to deal with the concat arguments, but no amount of playing seems to fix the problem. If I use ignore_index=True, it just changes the date column into indexed row numbers. Any ideas how to fix this?
EDIT: the input csv looks like this:
DATE,PM2.5,PM10,CO2,TEMPERATURE,HUMIDITY
2023-02-20 10:01:13,006.4,007.1,920,018.8,078.8,
2023-02-20 10:02:13,006.4,007.5,911,018.9,078.7,
2023-02-20 10:03:13,005.5,006.4,901,019.1,078.0,
2023-02-20 10:04:13,005.8,006.7,889,019.3,077.3,
When you read each csv file, using index_col=False
instead of index_col=None
should solve your problem:
flist = []
for filename in all_files:
df = pd.read_csv(filename, index_col=False) # <- HERE
flist.append(df)
df_out = pd.concat(flist, axis=0, ignore_index=False)
df_out.to_csv("output.csv")