I want to extract specific Mac Address from a log file that can appear in different formats.
For example, on these three lines:
Jun 16 10:24:28 (2248) Login OK: cli 88-c9-d0-fd-13-65 via TLS tunnel)
Jun 16 10:24:35 (2258) Login OK: cli f8:a9:d0:72:0a:dd via TLS tunnel)
Jun 16 10:24:44 (2273) Login OK: cli 485a.3f12.a35a via TLS tunnel)
with this regex:
([[:xdigit:]]{2}[:.-]?){5}[[:xdigit:]]{2}
I can bring out all the mac address, within the linux command less.
Assuming to search 48:5a:3f:12:a3:5a,how do I apply the same syntax with a specific mac address in Python?
I tried to write something like this:
regex = re.compile(r'([[:xdigit:]]{2}[:.-]?){5}[[:xdigit:]]{2}')
for line in file:
match = regex.search(line)
but obviously it doesn't work.
You may use
r'\b[a-f0-9]{2}(?:([:-]?)[a-f0-9]{2}(?:\1[a-f0-9]{2}){4}|(?:\.?[a-f0-9]{2}){5})\b'
See the regex demo (compile the regex object with the re.I
flag).
Explanation:
\b
- leading word boundary[a-f0-9]{2}
- 2 xdigits(?:
- start of a non-capturing group with 2 alternative patterns:
([:-]?)[a-f0-9]{2}(?:\1[a-f0-9]{2}){4}
:
([:-]?)
- Group 1 capturing a delimiter that is either a :
or -
[a-f0-9]{2}
- 2 xdigits(?:\1[a-f0-9]{2}){4}
- 4 sequences of the delimiter in Group 1 and 2 xdigits|
- or(?:\.?[a-f0-9]{2}){5})
- 5 sequences of an optional (1 or 9) dot (\.?
) and 2 xdigits.\b
- trailing word boundaryimport re
p = re.compile(r'\b[a-f0-9]{2}(?:([:-]?)[a-f0-9]{2}(?:\1[a-f0-9]{2}){4}|(?:\.?[a-f0-9]{2}){5})\b', re.IGNORECASE)
s = "Jun 16 10:24:28 (2248) Login OK: cli 88-c9-d0-fd-13-65 via TLS tunnel)\nJun 16 10:24:35 (2258) Login OK: cli f8:a9:d0:72:0a:dd via TLS tunnel)\nJun 16 10:24:44 (2273) Login OK: cli 485a.3f12.a35a via TLS tunnel)"
print([x.group() for x in p.finditer(s)])
# => ['88-c9-d0-fd-13-65', 'f8:a9:d0:72:0a:dd', '485a.3f12.a35a']