Search code examples
pythonpython-3.xregexstringpython-pattern

split string based on pattern python


I am trying to delete a pattern off my string and only bring back the word I want to store.

example                                return

2022_09_21_PTE_Vendor                  PTE
2022_09_21_SSS_01_Vendor               SSS_01
2022_09_21_OOS_market                  OOS

what I tried

fileName = "2022_09_21_PTE_Vendor"
newFileName = fileName.strip(re.split('[0-9]','_Vendor.xlsx'))

Solution

  • With Python's re module please try following Python code with its sub function written and tested in Python3 with shown samples. Documentation links for re and sub are added in hyperlinks used in their names in 1st sentence.

    Here is the Online demo for used Regex.

    import re
    fileName = "2022_09_21_PTE_Vendor"
    
    re.sub(r'^\d{4}(?:_\d{2}){2}_(.*?)_.+$', r'\1', fileName)
    'PTE'
    

    Explanation: Adding detailed explanation for used regex.

    ^\d{4}   ##From starting of the value matching 4 digits here.
    (?:      ##opening a non-capturing group here.
    _\d{2}   ##Matching underscore followed by 2 digits
    ){2}     ##Closing non-capturing group and matching its 2 occurrences.
    _        ##Matching only underscore here.
    (.*?)    ##Creating capturing group here where using lazy match concept to get values before next mentioned character.
    _.+$     ##Matching _ till end of the value here.