I am trying to sort filepaths according to their respective file extensions.
I would like to have an output like this:
FileType | FilePath |
---|---|
.h | a/b/c/d/xyz.h |
.h | a/b/c/d/xyz1.h |
.class | a/b/c/d/xyz.class |
.class | a/b/c/d/xyz1.class |
.jar | a/b/c/d/xyz.jar |
.jar | a/b/c/d/xyz1.jar |
But the output I have now is like this: output in excel
Below is my code:
import pandas as pd
import glob
path = "The path goes here"
yes = [glob.glob(path+e,recursive = True) for e in ["/**/*.h","/**/*.class","/**/*..jar"]]
print(type(yes)) #File type is list
df = pd.DataFrame(yes)
df = df.transpose()
df.columns = [".h", ".class",".jar"]
print (df)
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='filepath', index=False)
writer.save()
Could anyone please help me with this. Thanks in advance!
Please try this code:
import os
import pathlib
import pandas as pd
path = 'C:/'
full_file_paths = []
file_suffix = []
for (root,dirs,files) in os.walk(path):
for f in files:
file_suffix.append(pathlib.PurePosixPath(f).suffix)
full_file_paths.append(path+f)
file_suffix = set(file_suffix)
processed_files = dict()
for fs in file_suffix:
processed_files[fs]=[]
for f in full_file_paths:
if f.find(fs) > 0:
processed_files[fs].append(f)
print ('--------------------------------')
print(fs)
print(processed_files[fs])