Search code examples
pythonbashfile-handlingfile-renamepython-re

python regex: Parsing file name


I have a text file (filenames.txt) that contains the file name with its file extension.

filename.txt

    [AW] One Piece - 629 [1080P][Dub].mkv
    EP.585.1080p.mp4
    EP609.m4v
    EP 610.m4v
    One Piece 0696 A Tearful Reunion! Rebecca and Kyros!.mp4
    One_Piece_0745_Sons'_Cups!.mp4
    One Piece - 591 (1080P Funi Web-Dl -Ks-)-1.m4v
    One Piece - 621 1080P.mkv
    One_Piece_S10E577_Zs_Ambition_A_Great_and_Desperate_Escape_Plan.mp4

these are the example filename and its extension. I need to rename filename with the episode number (without changing its extension).

Example:

Input:
``````
    EP609.m4v
    EP 610.m4v
    EP.585.1080p.mp4
    One Piece - 621 1080P.mkv
    [AW] One Piece - 629 [1080P][Dub].mkv 
    One_Piece_0745_Sons'_Cups!.mp4
    One Piece 0696 A Tearful Reunion! Rebecca and Kyros!.mp4
    One Piece - 591 (1080P Funi Web-Dl -Ks-)-1.m4v
    One_Piece_S10E577_Zs_Ambition_A_Great_and_Desperate_Escape_Plan.mp4

Expected Output:
````````````````
    609.m4v
    610.m4v
    585.mp4
    621.mkv
    629.mkv
    745.mp4 (or) 0745.mp4
    696.mp4 (or) 0696.mp4
    591.m4v
    577.mp4

Hope someone will help me parse and rename these filenames. Thanks in advance!!!


Solution

  • As you tagged python, I guess you are willing to use python.

    (Edit: I've realized a loop in my original code is unnecessary.)

    import re
    
    with open('filename.txt', 'r') as f:
        files = f.read().splitlines() # read filenames
    
    # assume: an episode comprises of 3 digits possibly preceded by 0
    p = re.compile(r'0?(\d{3})')
    for file in files:
        if m := p.search(file):
            print(m.group(1) + '.' + file.split('.')[-1])
        else:
            print(file)
    

    This will output

    609.m4v
    610.m4v
    585.mp4
    621.mkv
    629.mkv 
    745.mp4
    696.mp4
    591.m4v
    577.mp4
    

    Basically, it searches for the first 3-digit number, possibly preceded by 0.

    I strongly advise you to check the output; in particular, you would want to run sort OUTPUTFILENAME | uniq -d to see whether there are duplicate target names.

    (Original answer:)

    p = re.compile(r'\d{3,4}')
    
    for file in files:
        for m in p.finditer(file):
            ep = m.group(0)
            if int(ep) < 1000:
                print(ep.lstrip('0') + '.' + file.split('.')[-1])
                break # go to next file if ep found (avoid the else clause)
        else: # if ep not found, just print the filename as is
            print(file)