Search code examples
pythoncsvdictionarypandasfilelist

Loop over file list into dictionary; based on string


I have a bunch of .csv files in a library and I would like to pull out their contents (they are NxM matrices) and put them into a dictionary. They are all the same size, and generically named {means1, means2, ...} and {trajectories1, trajectories2, ...}.

This is the bit of code i use to get the file list

import os
import glob
my_dir = 'insert your own datapath'
filelist = []
os.chdir( my_dir )
for files in glob.glob( "*.csv" ) :
    filelist.append(files)

which outputs

['means0.csv',
 'means1.csv',
 'means2.csv',
 'trajectories0.csv',
 'trajectories1.csv',
 'trajectories2.csv']

I am looking for a bit of code that will

  1. Extract names; in this case "means" and "trajectories"
  2. Creates a dict based on each name, i.e. means_dict = {}
  3. Fill up the dict with the relevant .csv files. E.g. ending up with something like means_dict['0'] = ('means0.csv') ect.

Hope it makes sense!


Solution

  • You really, really don't want to dynamically create dicts. Rather, use a containing dict with the key as "means", "trajectories" etc, and the values as a list of the files:

    from collections import defaultdict
    import re
    
    filedict = defaultdict(list)
    for filename in glob.glob( "*.csv" ) :
        result = re.match(r'([^\d]+)', filename)
        if result:
            filedict[result.group(1)].append(filename)