python bash file-handling file-rename python-re

python regex: Parsing file name

I have a text file (filenames.txt) that contains the file name with its file extension.

filename.txt

    [AW] One Piece - 629 [1080P][Dub].mkv
    EP.585.1080p.mp4
    EP609.m4v
    EP 610.m4v
    One Piece 0696 A Tearful Reunion! Rebecca and Kyros!.mp4
    One_Piece_0745_Sons'_Cups!.mp4
    One Piece - 591 (1080P Funi Web-Dl -Ks-)-1.m4v
    One Piece - 621 1080P.mkv
    One_Piece_S10E577_Zs_Ambition_A_Great_and_Desperate_Escape_Plan.mp4

these are the example filename and its extension. I need to rename filename with the episode number (without changing its extension).

Example:

Input:
``````
    EP609.m4v
    EP 610.m4v
    EP.585.1080p.mp4
    One Piece - 621 1080P.mkv
    [AW] One Piece - 629 [1080P][Dub].mkv 
    One_Piece_0745_Sons'_Cups!.mp4
    One Piece 0696 A Tearful Reunion! Rebecca and Kyros!.mp4
    One Piece - 591 (1080P Funi Web-Dl -Ks-)-1.m4v
    One_Piece_S10E577_Zs_Ambition_A_Great_and_Desperate_Escape_Plan.mp4

Expected Output:
````````````````
    609.m4v
    610.m4v
    585.mp4
    621.mkv
    629.mkv
    745.mp4 (or) 0745.mp4
    696.mp4 (or) 0696.mp4
    591.m4v
    577.mp4

Hope someone will help me parse and rename these filenames. Thanks in advance!!!

Solution

As you tagged python, I guess you are willing to use python.

(Edit: I've realized a loop in my original code is unnecessary.)

import re

with open('filename.txt', 'r') as f:
    files = f.read().splitlines() # read filenames

# assume: an episode comprises of 3 digits possibly preceded by 0
p = re.compile(r'0?(\d{3})')
for file in files:
    if m := p.search(file):
        print(m.group(1) + '.' + file.split('.')[-1])
    else:
        print(file)

This will output

609.m4v
610.m4v
585.mp4
621.mkv
629.mkv 
745.mp4
696.mp4
591.m4v
577.mp4

Basically, it searches for the first 3-digit number, possibly preceded by 0.

I strongly advise you to check the output; in particular, you would want to run sort OUTPUTFILENAME | uniq -d to see whether there are duplicate target names.

(Original answer:)

p = re.compile(r'\d{3,4}')

for file in files:
    for m in p.finditer(file):
        ep = m.group(0)
        if int(ep) < 1000:
            print(ep.lstrip('0') + '.' + file.split('.')[-1])
            break # go to next file if ep found (avoid the else clause)
    else: # if ep not found, just print the filename as is
        print(file)