Search code examples
pythonsplitdirectorypathglob

Creating folders from file names using only part of the file name


I would like to move files that exists in a directory into new folders that hold specific string from the names of the files. Here are examples for file names I have in the directory (out of thousands):

WS1_APAL4900.pdf
WS1_APAL4900_A.pdf
WS1_APAL4900_B.pdf
WS1_APAL4900_C.pdf
WS1_CANM0901.pdf
WS1_CANM0901_A.pdf
WS2_CANM0901.pdf
WS2_CANM0901_A.pdf
WS3_CANM0901.pdf
WS3_CANM0901_A.pdf
WS3_CONT6565.pdf

My goal is to split the name of the files on the underscore/s delimiters into 2 or 3 strings and to take only the second one from the left, which holds a 8 word/digit string and create the new folders from it, with a name like that: 'MFMO1720', whether it is a name with 2 or 3 strings. Now, all files that have the 8-characters right of the first underscore from the left, regardless if they do not have OR do have _A, _B, etc. in their name, should be moved into the new folder with their 8-character name.

For now, when I run the code the files with '_x' are gathered all under one folder (i.e 'APAL4900' folder holds 'WS1_APAL4900_A.pdf', 'WS1_APAL4900_B.pdf' and 'WS1_APAL4900_C.pdf', but not 'WS1_APAL4900.pdf'). Files that do NOT have _A, _B, _C, etc. go into a folder that has the .pdf extention in their name, ie. 'APAL4900.pdf' holds only one file (WS1_APAL4900.pdf).

I tried split() and rsplit() and also other splitting methods, but none of them helps me get all the files with the same 8-character number to the same folder.

Any help would be appreciated!

Here is the code:


folder = 'C:/test'

for file_path in glob.glob(os.path.join(folder, '*.*')):
    new_dir = file_path.rsplit('_', 2)[1]
    try:
        os.mkdir(os.path.join(folder, new_dir))
    except WindowsError:
        # Handle the case where the target dir already exist.
        pass
    shutil.move(file_path, os.path.join(new_dir, os.path.basename(file_path)))```




Solution

  • If you want to keep it basic, without regular expressions, you could just replace '.' with '_' before splitting.

    So something like (instead of your new_dir assignment):

    new_dir = file_path.replace('.', '_').split('_')[1]