Search code examples
pythonfilenames

Grab part of filename with Python


Newbie here.

I've just been working with Python/coding for a few days, but I want to create a script that grabs parts of filenames corresponding to a certain pattern, and outputs it to a textfile.

So in my case, let's say I have four .pdf like this:

aaa_ID_8423.pdf
bbbb_ID_8852.pdf
ccccc_ID_7413.pdf
dddddd_ID_4421.pdf

(Note that they are of variable length.)

I want the script to go through these filenames, grab the string after "ID_" and before the filename extension.

Can you point me in the direction to which Python modules and possibly guides that could assist me?


Solution

  • Here's a simple solution using the re module as mentioned in other answers.

    # Libraries
    import re
    
    # Example filenames. Use glob as described below to grab your pdf filenames
    file_list = ['name_ID_123.pdf','name2_ID_456.pdf'] # glob.glob("*.pdf") 
    
    for fname in file_list:
        res = re.findall("ID_(\d+).pdf", fname)
        if not res: continue
        print res[0] # You can append the result to a list
    

    And below should be your output. You should be able to adapt this to other patterns.

    # Output
    123
    456
    

    Goodluck!