Search code examples
pythonregexpython-re

Regex to extract usernames/names from a string


I have strings that includes names and sometime a username in a string followed by a datetime stamp:

GN1RLWFH0546-2020-04-10-18-09-52-563945.txt
JOHN-DOE-2020-04-10-18-09-52-563946t64.txt
DESKTOP-OHK45JO-2020-04-09-02-27-11-451975.txt

I want to extract the usernames from this string:

GN1RLWFH0546
JOHN-DOE   
DESKTOP-OHK45JO

I have tried different regex patterns the closest I came to extract was following:

GN1RLWFH0546
DESKTOP
JOHN

Using the following regex pattern:

names = re.search(r"\(?([0-9A-Za-z]+)\)?", agent_str)
print(names.group(1))

Solution

  • You may get all text up to the first occurrence of -+digits+-:

    ^.*?(?=-\d+-)
    

    If the number must be exactly 4 digits (say, if it is a year), then replace + with {4}:

    ^.*?(?=-\d{4}-)
    

    See the regex demo

    Details

    • ^ - start of string
    • .*? - any 0+ chars other than line break chars, as few as possible
    • (?=-\d+-) - up to the first occurrence of - and 1+ digits (or, if \d{4} is used, exactly four digits) and then - (this part is not added to the match value as the positive lookahead is a non-consuming pattern).

    See Python demo:

    import re
    strs = ["GN1RLWFH0546-2020-04-10-18-09-52-563945.txt", "JOHN-DOE-2020-04-10-18-09-52-563946t64.txt", "DESKTOP-OHK45JO-2020-04-09-02-27-11-451975.txt"]
    rx = re.compile(r"^.*?(?=-\d+-)")
    for s in strs:
      m = rx.search(s)
      if m:
        print("{} => '{}'".format(s, m.group()))
    

    Output:

    GN1RLWFH0546-2020-04-10-18-09-52-563945.txt => 'GN1RLWFH0546'
    JOHN-DOE-2020-04-10-18-09-52-563946t64.txt => 'JOHN-DOE'
    DESKTOP-OHK45JO-2020-04-09-02-27-11-451975.txt => 'DESKTOP-OHK45JO'