Search code examples
pythonfilereplacepython-os

Create folders with file name and rename part of it


i have some pdfs files that i need to create folders with part of your name and move the pdfs to the folders.

cwd = os.getcwd()
padrao = '_V1_A0V0_T07-54-369-664_S00001.pdf'
for file in glob.glob("*.pdf"):
    dst = cwd + "\\" + file.replace(str(padrao), '').replace('P', '')
    os.mkdir(dst)
    shutil.move(file, dst)

ex: I have the file P9883231_V1_A0V0_T07-54-369-664_S00001.pdf, P9883231_V1_A0V0_T07-54-369-664_S00002.pdf and P1235567_V1_A0V0_T07-54-369-664_S00001.pdf.

In this example I need the script to create two folders: 9883231 and 1234567. (the part in italics must be the name of the folder)

notice that in my code I remove the unwanted parts to create the folder, the 'P' at the beginning and part of padrao = '_V1_A0V0_T07-54-369-664_S00001.pdf'

The problem is that at the end of the padrao the number can be variable, the file can end with "02.pdf" , "03.pdf"

In the example I mentioned above, the folder 9883231 should contain both files.


Solution

  • Regular expressions can do the trick here:

    import re
    import os
    import glob
    import shutil
    
    
    cwd = os.getcwd()
    padrao = '_V1_A0V0_T07-54-369-664_S000'
    for file in glob.glob("*.pdf"):
        dst = os.path.join(cwd, re.findall("P(.*)" + padrao + "\d{2}.pdf", file)[0])
    
        os.mkdir(dst)
        shutil.move(file, dst)
    

    Notice that I remove the part of padrao that varies. The regex matches all strings that begin ith a P, followed by the padrao string value, followed by 2 digits, followed by .pdf; and takes the first occurence (no check is made wether it found anything here ...)

    Also, it is better practice to use os.path.join() to avoid issues when creating path strings (when whanging os notably)