Search code examples
python-3.xpandascsvconcatenation

Concatenate all csv files in a folder using pandas and remove leading comma


I'm trying to write a basic program to concatenate a bunch of csv files in a folder, into a single file. Hopefully, with the column header displaying only once. All the files are arranged in the same way regarding numbers of columns and headings. The issue is that the output file starts with a ',' and thus pushes the headings across 1 column. All the data appears fine.

The below is what I've tried using:

import pandas as pd
import glob
from tkinter.filedialog import askdirectory

path = askdirectory() + '/'
files = glob.glob("*.csv")
all_files = [path + f for f in files]

flist = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    flist.append(df)

df_out = pd.concat(flist, axis=0, ignore_index=False)

df_out.to_csv("output.csv")

Which give an output.csv that look like this:

,DATE,PM2.5,PM10,CO2,TEMPERATURE,HUMIDITY
2023-02-20 10:01:13,6.4,7.1,920,18.8,78.8,
2023-02-20 10:02:13,6.4,7.5,911,18.9,78.7,
2023-02-20 10:03:13,5.5,6.4,901,19.1,78.0,
2023-02-20 10:04:13,5.8,6.7,889,19.3,77.3,

I'm sure that I'm misunderstanding how to deal with the concat arguments, but no amount of playing seems to fix the problem. If I use ignore_index=True, it just changes the date column into indexed row numbers. Any ideas how to fix this?

EDIT: the input csv looks like this:

DATE,PM2.5,PM10,CO2,TEMPERATURE,HUMIDITY
2023-02-20 10:01:13,006.4,007.1,920,018.8,078.8,
2023-02-20 10:02:13,006.4,007.5,911,018.9,078.7,
2023-02-20 10:03:13,005.5,006.4,901,019.1,078.0,
2023-02-20 10:04:13,005.8,006.7,889,019.3,077.3,

Solution

  • When you read each csv file, using index_col=False instead of index_col=None should solve your problem:

    flist = []
    for filename in all_files:
        df = pd.read_csv(filename, index_col=False)  # <- HERE
        flist.append(df)
    
    df_out = pd.concat(flist, axis=0, ignore_index=False)
    df_out.to_csv("output.csv")