Search code examples
pythonpandasdataframepandas.excelwriter

Sort filepaths according to their respective file extensions


I am trying to sort filepaths according to their respective file extensions.

I would like to have an output like this:

FileType FilePath
.h a/b/c/d/xyz.h
.h a/b/c/d/xyz1.h
.class a/b/c/d/xyz.class
.class a/b/c/d/xyz1.class
.jar a/b/c/d/xyz.jar
.jar a/b/c/d/xyz1.jar

But the output I have now is like this: output in excel

Below is my code:

import pandas as pd
import glob

path = "The path goes here"

yes = [glob.glob(path+e,recursive = True) for e in ["/**/*.h","/**/*.class","/**/*..jar"]]

print(type(yes))  #File type is list
    
df = pd.DataFrame(yes)
df = df.transpose()
df.columns = [".h", ".class",".jar"]
print (df)

writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='filepath', index=False)
writer.save()

Could anyone please help me with this. Thanks in advance!


Solution

  • Please try this code:

    import os
    import pathlib
    import pandas as pd
    
    path = 'C:/'
    
    full_file_paths = []
    file_suffix = []
    for (root,dirs,files) in os.walk(path): 
            for f in files:
                file_suffix.append(pathlib.PurePosixPath(f).suffix)
                full_file_paths.append(path+f)
            
    file_suffix = set(file_suffix)
    processed_files = dict()
    for fs in file_suffix:
        processed_files[fs]=[]
        for f in full_file_paths:
            if f.find(fs) > 0:
                processed_files[fs].append(f)
        print ('--------------------------------') 
        print(fs)
        print(processed_files[fs])