I post slightly changed Logs down.
I have an regex to match 3 different groups in one log line, i match the Time, the ip and the messages that the SMTP server recieved.
i tryed it with the following regex (\d{2}.\d{2}.\d{4} \d{2}:\d{2}:\d{2}).*(\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})..disconnected.?\s+(\d+) message[s]
The problem is only the 2. Group with the IP`s to show you the problem in the first line the ip is 11.132.8.61 what regexr cathces is only 1.132.8.6 so he leaves some numbers out. I thought with the \d{1,3} he will match all three or two numbers if there is more than one, he also does is in the second bracket but not in the first or last.
[16A4:000C-0780] 01.12.2020 01:00:07 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000E-07F8] 01.12.2020 01:00:07 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000E-0780] 01.12.2020 01:00:07 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-0780] 01.12.2020 01:00:07 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-07F8] 01.12.2020 01:00:08 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-0780] 01.12.2020 01:04:51 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-07F8] 01.12.2020 01:30:46 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-0780] 01.12.2020 01:30:46 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000E-0780] 01.12.2020 01:33:25 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-07F8] 01.12.2020 01:33:25 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[12CC:0015-118C] 30.11.2020 05:08:59 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:000B-118C] 30.11.2020 05:08:59 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:000F-0FF0] 30.11.2020 05:08:59 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:000F-120C] 30.11.2020 05:10:05 SMTP Server: bsicip03.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:0015-118C] 30.11.2020 05:10:05 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:0014-118C] 30.11.2020 05:10:05 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:000B-120C] 30.11.2020 05:10:05 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:000A-120C] 30.11.2020 05:10:05 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
The expected out-put would be
match[1] = 01.12.2020 01:00:07
match[2] = 11.132.8.61
match[3] = 1
Change .*
to .*?
(or, given that that you can expect least one character to occur between the capture groups, .+?
) to make the subexpression non-greedy.
That way, .*
doesn't "steal" up to two leading digits from the what the following \d{1,3}
subexpression matches.
To give a simple example:
# !! BROKEN: greedy.
PS> if (' 123' -match '.*(\d{1,3})') { $Matches[1] }
3 # !! Only the LAST digit matched, because .* matched as much as it
# !! could while still matching \d{1,3}
# OK: non-greedy.
PS> if (' 123' -match '.*?(\d{1,3})') { $Matches[1] }
123 # OK - all 3 digits matched, because .*? matched as little as it
# could while still matching \d{1,3}
To put it all together (note that I'm using .+?
, also in lieu of ..
before disconnected
):
'[16A4:000C-0780] 01.12.2020 01:00:07 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received',
'[12CC:0015-118C] 30.11.2020 05:08:59 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received' |
ForEach-Object {
if ($_ -match '(\d{2}\.\d{2}\.\d{4} \d{2}:\d{2}:\d{2}).+?(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}).+?disconnected\.?\s+(\d+) message\[s\]') {
[pscustomobject] @{
Count = $Matches[3]
Timestamp = $Matches[1]
IP = $Matches[2]
}
}
}
The above yields:
Count Timestamp IP
----- --------- --
1 01.12.2020 01:00:07 11.132.8.61
1 30.11.2020 05:08:59 12.99.81.53
Note:
\b
, around subexpressions such as .\d{1,3}
so that they don't accidentally match inside longer runs of digits, or you could explicitly stipulate that a non-digit (\D
) precede and follow.Alternative solution using the -split
operator:
As Lee Daley points out, you could use -split
, the string splitting operator to split your lines into fields, as a conceptually simpler alternative to regexes:
'[16A4:000C-0780] 01.12.2020 01:00:07 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received',
'[12CC:0015-118C] 30.11.2020 05:08:59 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received' |
ForEach-Object {
$fields = -split $_
if ($fields[-4] -eq 'disconnected.') {
[pscustomobject] @{
Count = $fields[-3]
Timestamp = '{0} {1}' -f $fields[1], $fields[2]
IP = $fields[-5].Trim('()')
}
}
}
The above yields the same as the regex-based solution.