Grab part of filename with Python

Newbie here.

I've just been working with Python/coding for a few days, but I want to create a script that grabs parts of filenames corresponding to a certain pattern, and outputs it to a textfile.

So in my case, let's say I have four .pdf like this:

aaa_ID_8423.pdf
bbbb_ID_8852.pdf
ccccc_ID_7413.pdf
dddddd_ID_4421.pdf

(Note that they are of variable length.)

I want the script to go through these filenames, grab the string after "ID_" and before the filename extension.

Can you point me in the direction to which Python modules and possibly guides that could assist me?

Solution

Here's a simple solution using the re module as mentioned in other answers.

# Libraries
import re

# Example filenames. Use glob as described below to grab your pdf filenames
file_list = ['name_ID_123.pdf','name2_ID_456.pdf'] # glob.glob("*.pdf") 

for fname in file_list:
    res = re.findall("ID_(\d+).pdf", fname)
    if not res: continue
    print res[0] # You can append the result to a list

And below should be your output. You should be able to adapt this to other patterns.

# Output
123
456

Goodluck!