Search code examples
pythondataframecsviterationglob

How can one iterate through files in folder in natural sort order using glob.glob(path)?


I am currently trying to do something very basic: compute the sum of two cells in a .csv file and output it into a new DataFrame. I then am repeating this for multiple rows in that .csv file, and multiple files in a folder. After all this, I am outputting the DataFrame to a .xlsx file. Main body of code is below:

for fname in glob.glob(path):
    print(fname)
    processed = []
    df = pd.read_csv(fname)
    for index, row in df.iterrows():
        processed.append(row['Rejected'] + row['Sorted'])
    heatMap[str(counter)] = processed
    counter += 1

newfname = 'Output.xlsx' heatMap.to_excel(newfname)

However, when I look at my newly created DataFrame, the columns are out of order. Inspecting the console, I can see the files are iterated through in a alphanumeric order.

Console output

I was wondering how my method can be adjusted so that I can iterate through the files in a natural sort order (1, 2, 3, 4, 5 etc.), so I don't have to change the name of each file.

Thank you!


Solution

  • for fname in sorted(glob.glob(path)):
        ...
    

    This makes the glob iteration a list, so that we can sort it using the python sorted keyword. You can then loop through it in alphabetical order.

    For natural sort, there is a natsort package.

    from natsort import natsorted
    for fname in natsorted(glob.glob(path)):
        ...