I'm trying to parse the mac addresses from arp-scan output. There's an example:
import re
from subprocess import Popen, PIPE
def get_active_hosts():
with Popen(['sudo', 'arp-scan', '-l', '-r', '5'], stdout = PIPE) as proc:
mac_list = re.compile('\s+(([0-9A-Fa-f]{2}:){5}([0-9A-Fa-f]){2})\s+')
mac_list = mac_list.findall(proc.stdout.read().decode('utf-8'))
return mac_list
print(get_active_hosts())
But I got this output:
[('4a:c3:26:0e:85:d0', '85:', '0')]
What's going on ? How to capture only mac addresses without this trash:
[('85:', '0')]
Thanks for advice.
findall
is returning all of the matching groups that it found. Groups are declared using a set of parentheses. Your regular expression contains three groups as follows:
(([0-9A-Fa-f]{2}:){5}([0-9A-Fa-f]){2})
([0-9A-Fa-f]{2}:)
([0-9A-Fa-f])
So now hopefully you understand why findall
gives you three matches, and why they look like they do.
The solution here is to declare these extra groups (the ones you don't want) to be non-capturing by putting ?:
after the opening parenthesis as follows:
mac_list = re.compile('\s+((?:[0-9A-Fa-f]{2}:){5}(?:[0-9A-Fa-f]){2})\s+')