Search code examples
pythonregexstringregex-lookaroundsregex-group

Regex to find N characters between underscore and period


I have a filename having numerals like test_20200331_2020041612345678.csv.

So I just want to read only first 8 characters from the number between last underscore and .csv using a regex. For e.g: From the file name test_20200331_2020041612345678.csv --> i want to read only 20200416 using regex.

Regex tried: (?<=_)(\d+)(?=\.)

But it is returning the full number between underscore and period i.e 2020041612345678

Also, when tried quantifier like (?<=_)(\d{8})(?=\.) its not matching with any string


Solution

  • The (?<=_)(\d{8})(?=\.) does not work because the (?=\.) positive lookahead requires the presence of a . char immediately to the right of the current location, i.e. right after the eigth digit, but there are more digits in between.

    You may add \d* before \. to match any amount of digits after the required 8 digits, use

    (?<=_)\d{8}(?=\d*\.)
    

    Or, with a capturing group, you do not even need lookarounds (just make sure you access Group 1 when a match is obtained):

    _(\d{8})\d*\.
    

    See the regex demo

    Python demo:

    import re
    s = "test_20200331_2020041612345678.csv"
    m = re.search(r"(?<=_)\d{8}(?=\d*\.)", s)
    # m = re.search(r"_(\d{8})\d*\.", s) # capturing group approach
    if m:
        print(m.group())  # => 20200416
        # print(m.group(1))  # capturing group approach