Search code examples
pythonregexpython-re

Expression that captures all characters up to a group of characters


I have several alerts coming from a DC server, which have the following pattern:

alert - name risk score - severity - total

The examples of these alerts would be:

A member was added to a security-enabled local group 47 medium 2
A member was added to a security-enabled universal group 47 medium 1
A security-enabled global group was changed 73 high 2
A security-enabled local group was changed 73 high 2
A user account was locked out  47 medium 31
An attempt was made to reset an accounts password  73 high 14
Member added to security-enabled global group  73 high 2
PowerShell Keylogging Script 73 high 23
PowerShell Suspicious Script with Audio Capture Capabilities 47 medium 23
More Than 3 Failed Login Attempts Within 1 Hour  47 medium 6
Over 100 Connection from 10 Diff. IPs 47 medium 234
Over 100 Connections Attempted 73 high 123
Failed Logins Not Followed by Success Within 2 Hours 21 low 8

I've been using the following pattern to capture only the name of the alerts:

^(\D*)

Essentially, this filters out all of the digits, but now have I received a few alerts I hadn't accounted for. These alerts contain digits in them. For example:

More Than 3 Failed Login Attempts Within 1 Hour  47 medium 6
Over 100 Connection from 10 Diff. IPs 47 medium 234
Over 100 Connections Attempted 73 high 123
Failed Logins Not Followed by Success Within 2 Hours 21 low 8

So I need to be able to capture the complete name, otherwise, I'm ending up with:

More than
Over
Over
Failed Logins Not Followed by Success Within

Despite my efforts, I have not been able to capture the desire pattern. This would be the desired output:

A member was added to a security-enabled local group
A member was added to a security-enabled universal group
A security-enabled global group was changed
A security-enabled local group was changed
A user account was locked out 
An attempt was made to reset an accounts password 
PowerShell Keylogging Script 
PowerShell Suspicious Script with Audio Capture Capabilities
More Than 3 Failed Login Attempts Within 1 Hour
Over 100 Connection from 10 Diff. IPs 
Over 100 Connections Attempted
Failed Logins Not Followed by Success Within 2 Hours

Thanks for taking the time to help!


Solution

  • Here is an alternative possible re. Note: I am anticipating that alerts is a list of strings.
    The pattern matches any string of characters at the beginning of the string ^(.*), followed by \s which matches any whitespace character. (\d+) matches one or more digits then one or more letters (\w+) and one or more digits (\d+) at the end of the string ($).

    import re
    
    data = """
    A member was added to a security-enabled local group 47 medium 2
    A member was added to a security-enabled universal group 47 medium 1
    A security-enabled global group was changed 73 high 2
    A security-enabled local group was changed 73 high 2
    A user account was locked out  47 medium 31
    An attempt was made to reset an accounts password  73 high 14
    Member added to security-enabled global group  73 high 2
    PowerShell Keylogging Script 73 high 23
    PowerShell Suspicious Script with Audio Capture Capabilities 47 medium 23
    More Than 3 Failed Login Attempts Within 1 Hour  47 medium 6
    Over 100 Connection from 10 Diff. IPs 47 medium 234
    Over 100 Connections Attempted 73 high 123
    Failed Logins Not Followed by Success Within 2 Hours 21 low 8
    """
    
    alerts = data.splitlines()
    
    pattern = re.compile(r'^(.*)\s\d+\s\w+\s\d+$')
    
    for alert in alerts:
        res = pattern.search(alert)
        if res:
            print(res.group(1))
    

    You can also choose to use a list-comprehension to store all the matches and then unpack the entire list instead of using the above for-loop to print one match at a time:

    res = [pattern.search(alert).group(1) 
            for alert in alerts if pattern.search(alert)]
    print(*res, sep="\n")
    

    A member was added to a security-enabled local group
    A member was added to a security-enabled universal group
    A security-enabled global group was changed
    A security-enabled local group was changed
    A user account was locked out
    An attempt was made to reset an accounts password
    Member added to security-enabled global group
    PowerShell Keylogging Script
    PowerShell Suspicious Script with Audio Capture Capabilities
    More Than 3 Failed Login Attempts Within 1 Hour
    Over 100 Connection from 10 Diff. IPs
    Over 100 Connections Attempted
    Failed Logins Not Followed by Success Within 2 Hours