I have some number of csv files in a folder.
water
folder has water_202201.csv
, water_202202.csv
and water_202203.csv
.I want to aggregate these 3 files. Of course, I can do like the code below.
import pandas as pd
import numpy as np
#-*-coding:utf-8-*-
dat1=pd.read_csv("C:/water/water_202201.csv")
dat2=pd.read_csv("C:/water/water_202202.csv")
dat3=pd.read_csv("C:/water/water_202202.csv")
frames=[data1,data2,data3]
result1=pd.concat(frames)
result1
But the question is how to aggregate if I don't know how many csv files are in the water
folder and somehow I want to aggregate every csv files inside that folder.
202201 means 2022 January
You can use pathlib
to iterate over your folder:
import pathlib
data = {}
for file in pathlib.Path('C:/water').glob('water_*.csv'):
date = file.stem.split('_')[-1] # extract 202201 for water_202201.csv
df = pd.read_csv(file)
data[date] = df
# One-line version
data = {file.stem.split('_')[1]: pd.read_csv(file)
for file in pathlib.Path('./data/water').glob('water_*.csv')}
Now 2 possibilities:
Prefix the index by the date
key:
df = pd.concat(data)
Without prefix:
df = pd.concat(data.values())