Search code examples
pythonregextelnet

regex for filtering word based on length or by excluding a word in Python


I have been trying to figure this one out but since I am a neewby at regex I haven't been able. I need to select the right lines of some telnet output which looks like the following:

systemstatus get resume    # line to exclude
systemstatus get idle      # line to filter
systemstatus get talking   # line to filter
systemstatus get ringing   # line to filter
systemstatus get outgoing  # line to filter
systemstatus get sleeping  # line to filter

As you can see I need to exclude the one with resume and select all others. So I know I could filter by length but I only know how to filter by length bigger than something but not by many lengths. For example: "systemstatus get \w{7,}" would exclude the resume line but also the idle line. So actually I need something that filters lengths of 4, 7 and 8.

Does anyone knows how to do this?

Note: this must be done in regex because of telnet library.

Note2: Since it is telnet, I have to keep reading when the systemstatus get resume appears (that's what I mean by "excluding") and not stop as I would do when a systemstatus get idle comes in. So filtering by "systemstatus get WHATEVER" and then exluding "resume" would stop reading when "resume" comes in. I am using telnet.expect([], timeout) of the telnet lib.


Solution

  • Option 1
    Call re.findall with the re.MULTILINE switch.

    matches = re.findall(r"systemstatus get \b(?:\w{4}|\w{7,8})\b", t, re.M)
    

    Which returns each match as a list of strings.

    Regex Details

    systemstatus get    # literals
    \b                  # word boundary
    (?:                 # non-capturing group
    \w{4}               # find a word of size 4 
    |                   # regex OR pipe
    \w{7,8}             # find a word of size 7 or 8
    )
    \b
    

    We're matching by word size here because of your requirement -

    I need something that filters lengths of 4, 7 and 8.


    Option 2
    Split your multiline string into separate lines, iterate over each line and call re.match on each one -

    matches = []
    
    for line in t.splitlines():
        if re.match(r"systemstatus get \b(?:\w{4}|\w{7,8})\b", line):
            matches.append(line)