Search code examples
pythonloopsfor-loopglobpathlib

How to iterate through a directory and read in 2 files at each iteration using pathlib.Path().glob()


Using pathlib.Path().glob(), how do we iterate through a directory and read in 2 files at each iteration?

Suppose my directory C:\Users\server\Desktop\Dataset looks like this:

P1_mean_fle.csv
P2_mean_fle.csv
P3_mean_fle.csv
P1_std_dev_fle.csv
P2_std_dev_fle.csv
P3_std_dev_fle.csv

If I want to read in only 1 file at each iteration of the Pi's, my code would look like this:

from pathlib import Path
import pandas as pd

file_path = r'C:\Users\server\Desktop\Dataset'
param_file = 'P*' + '_mean_fle.csv'

for i, fle in enumerate(Path(file_path).glob(param_file)):
    mean_fle = pd.read_csv(fle).values

    results = tuning(mean_fle)  #tuning is some function which takes in the file mean 
                                #and does something with this file

Now, how I do read in 2 files at each iteration of the Pi's? The code below doesn't quite work because param_file can only be assigned with one file name type. Would appreciate if there is a way to do this using pathlib.

from pathlib import Path
import pandas as pd

param_file = 'P*' + '_mean_fle.csv'
param_file = 'P*' + '_std_dev_fle.csv'  #this is wrong

for i, fle in enumerate(Path(file_path).glob(param_file)):  #this is wrong inside the glob() part
    mean_fle = pd.read_csv(fle).values
    std_dev_fle = pd.read_csv(fle).values

    results = tuning(mean_fle, std_dev_fle)  #tuning is some function which takes in the two files mean 
                                             #and std_dev and does something with these 2 files

Thank you in advance.


Solution

  • If your filenames follow deterministic rules as in the example, your best bet is to iterate one kind of files, and find the corresponding file by string replacement.

    from pathlib import Path
    import pandas as pd
    
    file_path = r'C:\Users\server\Desktop\Dataset'
    param_file = 'P*' + '_mean_fle.csv'
    
    for i, fle in enumerate(Path(file_path).glob(param_file)):
        stddev_fle = fle.with_name(fle.name.replace("mean", "std_dev"))
        mean_values = pd.read_csv(fle).values
        stddev_values = pd.read_csv(stddev_fle).values
    
        results = tuning(mean_values, stddev_values)