I am currently trying to do something very basic: compute the sum of two cells in a .csv file and output it into a new DataFrame. I then am repeating this for multiple rows in that .csv file, and multiple files in a folder. After all this, I am outputting the DataFrame to a .xlsx file. Main body of code is below:
for fname in glob.glob(path):
print(fname)
processed = []
df = pd.read_csv(fname)
for index, row in df.iterrows():
processed.append(row['Rejected'] + row['Sorted'])
heatMap[str(counter)] = processed
counter += 1
newfname = 'Output.xlsx' heatMap.to_excel(newfname)
However, when I look at my newly created DataFrame, the columns are out of order. Inspecting the console, I can see the files are iterated through in a alphanumeric order.
I was wondering how my method can be adjusted so that I can iterate through the files in a natural sort order (1, 2, 3, 4, 5 etc.), so I don't have to change the name of each file.
Thank you!
for fname in sorted(glob.glob(path)):
...
This makes the glob iteration a list, so that we can sort it using the python sorted
keyword. You can then loop through it in alphabetical order.
For natural sort, there is a natsort
package.
from natsort import natsorted
for fname in natsorted(glob.glob(path)):
...