Search code examples
pythonregexpython-re

python : Using regex to get episode


This code works except with this fileName :

Terkel in Trouble 2004

it should return 'null' instead the match returns 'e 200' becouse of :

      e|x|episode|Ep|^        

and

    (\d{2,3})                 

How can I prevent that ?

def getEpisode(filename):
match = re.search(
    r'''(?ix)                 
    (?:                       
      e|x|episode|Ep|^        
      )                       
    \s*                       
    (\d{2,3})                 
    ''', filename)
if match:
    print (match)
    return match.group(1)


**EDIT:**
    test = (
    "0x01 GdG LO Star Lord  Part 1",             #1 
    "S01E01 GdG  Verso Nowhere",                 #2 
    "Wacky Races Episode 20 X264 Ac3",           #3
    "Terkel in Trouble 2004",                    #4 return None, it's ok
    "Yu Yu Hakusho  Ep 100  secret",             #5
    "Kakegurui S1 Ep11 La donna che scommette",  #6
    "Kakegurui S1 Ep12 La donna che gioca",      #7
    "ep 01 wolf's rain",                         #8
    "Toradora! 08"                               #9
)

Solution

  • try using Word Boundaries \b

    regex updated

    \b(?:e(?:p(?:isode)?)?|0x|S\d\dE)?\s*?(\d{2,3})\b
    

    results

    1 ->  0x01
    2 ->  S01E01
    3 ->  Episode 20
    4 ->  
    5 ->  Ep 100
    6 ->  Ep11
    7 ->  Ep12
    8 ->  ep 01
    9 ->  08