Search code examples
pythonpandascsvexport-to-csv

Aggregating the unknown csv files for a particular folder


I have some number of csv files in a folder. waterfolder has water_202201.csv , water_202202.csvand water_202203.csv.I want to aggregate these 3 files. Of course, I can do like the code below.

import pandas as pd
import numpy as np
#-*-coding:utf-8-*-
dat1=pd.read_csv("C:/water/water_202201.csv")
dat2=pd.read_csv("C:/water/water_202202.csv")
dat3=pd.read_csv("C:/water/water_202202.csv")
frames=[data1,data2,data3]
result1=pd.concat(frames)
result1

But the question is how to aggregate if I don't know how many csv files are in the waterfolder and somehow I want to aggregate every csv files inside that folder. 202201 means 2022 January


Solution

  • You can use pathlib to iterate over your folder:

    import pathlib
    
    data = {}
    for file in pathlib.Path('C:/water').glob('water_*.csv'):
        date = file.stem.split('_')[-1]  # extract 202201 for water_202201.csv
        df = pd.read_csv(file)
        data[date] = df
    
    # One-line version
    data = {file.stem.split('_')[1]: pd.read_csv(file) 
               for file in pathlib.Path('./data/water').glob('water_*.csv')}
    

    Now 2 possibilities:

    Prefix the index by the date key:

    df = pd.concat(data)
    

    Without prefix:

    df = pd.concat(data.values())