Search code examples
pythonpandasdictionarypathlib

How to create a dict of dataframes from multiple csv files


I am loading a csv file in pandas as

premier10 = pd.read_csv('./premier_league/pl_09_10.csv')

However, I have 20+ csv files, which I was hoping to load as separate dfs (one df per csv) using a loop and predefined names, something similar to:

import pandas as pd
file_names = ['pl_09_10.csv','pl_10_11.csv']
names = ['premier10','premier11']
for i in range (0,len(file_names)):
     names[i] = pd.read_csv('./premier_league/{}'.format(file_names[i]))

(Note, here I provide only two csv files as example) Unfortunately, this doesn't work (no error messages, but the the pd dfs don't exist).

Any tips/links to previous questions would be greatly appreciated as I haven't found anything similar on Stackoverflow.


Solution

    1. Use pathlib to set a Path, p, to the files
    2. Use the .glob method to find the files matching the pattern
    3. Create a dataframe with pandas.read_csv
      • Use a dict comprehension to create a dict of dataframes, where each file will have its own key-value pair.
        • Use the dict like any other dict; the keys are the file names and the values are the dataframes.
      • Alternatively, use a list comprehension with pandas.concat to create a single dataframe from all the files.
    from pathlib import Path
    import pandas as pd
    
    # set the path to the files
    p = Path('some_path/premier_league')  
    
    # create a list of the files matching the pattern
    files = list(p.glob(f'pl_*.csv'))
    
    # creates a dict of dataframes, where each file has a separate dataframe
    df_dict = {f.stem: pd.read_csv(f) for f in files}  
    
    # alternative, creates 1 dataframe from all files
    df = pd.concat([pd.read_csv(f) for f in files])