Search code examples
pythoncsvglobdata-analysis

How to analyze multiple .csv files whose names depend on timestamps in a single Python script?


I have some weekly .csv files which are named depending on the start and end date and time of the week, for example:

File_2018-01-01_05-30-00_2018-01-08_02-00-00

I want to analyze them using a single Python script and my idea was to loop over the .csv in the folder and then run the rest of the code.

I know it is possible to concatenate more .csv files into a single one, but my computer doesn't support so many and I am interested in the results of the respective periods separately.

Is there any way of using the glob function/library if the names are so different?


Solution

  • Assuming you are using Python 3.x, you can use glob.glob() to let you iterate over all suitable filenames as follows:

    import glob
    import csv
    
    for filename in glob.glob("File_*.csv"):
        print("Processing '{}'".format(filename))
    
        with open(filename, newline='') as f_input:
            csv_input = csv.reader(f_input)
    
            for row in csv_input:
                print(row)
    
        print()
    

    In this example it finds all CSV files starting with File_, opens them, displays the filename and then all of the rows from the file. So if for example you had a CSV file called File_2018-01-01_05-30-00_2018-01-08_02-00-00.csv containing:

    col1,col2
    a,b
    c,d
    

    The script would display:

    Processing 'File_2018-01-01_05-30-00_2018-01-08_02-00-00.csv'
    ['col1', 'col2']
    ['a', 'b']
    ['c', 'd']    
    

    This would then be repeated for any other matching filenames in the same folder.

    If you are using Python 2.x, you would need to modify this line:

    with open(filename, 'rb') as f_input: