I want to concatenate several csv files which I saved in a directory ./Errormeasure
. In order to do so, I used the following answer from another thread https://stackoverflow.com/a/51118604/9109556
filepaths =[f for f in listdir('./Errormeasure')if f.endswith('.csv')]
df=pd.concat(map(pd.read_csv,filepaths))
print(df)
However, this code only works, when I have the csv files I want to concatentate both in the ./Errormeasure
directory as well as in the directory below, ./venv
. This however is obviously not convenient.
When I have the csv files only in the ./Errormeasure
, I recieve the following error:
FileNotFoundError: [Errno 2] File b'errormeasure_871687110001543570.csv' does not exist: b'errormeasure_871687110001543570.csv'
Can you give me some tips to tackle this problem? I am using pycharm. Thanks in advance!
Using os.listdir()
only retrieves file names and not parent folders which is needed for pandas.read_csv()
at relative (where pandas script resides) or absolute levels.
Instead consider the recursive feature of built-in glob
(only available in Python 3.5+) to return full paths of all csv files at top level and subfolders.
import glob
for f in glob.glob(dirpath + "/**/*.csv", recursive=True):
print(f)
From there build data frames in list comprehension (bypassing map
-see List comprehension vs map) to be concatenated with pd.concat
:
df_files = [pd.read_csv(f) for f in glob.glob(dirpath + "/**/*.csv", recursive=True)]
df = pd.concat(df_files)
print(df)
For Python < 3.5, consider os.walk()
+ os.listdir()
to retrieve full paths of csv files:
import os
import pandas as pd
# COMBINE CSVs IN CURR FOLDER + SUB FOLDERS
fpaths = [os.path.join(dirpath, f)
for f in os.listdir(dirpath) if f.endswith('.csv')] + \
[os.path.join(fdir, fld, f)
for fdir, flds, ffile in os.walk(dirpath)
for fld in flds
for f in os.listdir(os.path.join(fdir, fld)) if f.endswith('.csv')]
df = pd.concat([pd.read_csv(f) in for f in fpaths])
print(df)