Search code examples
pythonpython-3.xmac-addressarp

python3 arp-scan and mac parsing


I'm trying to parse the mac addresses from arp-scan output. There's an example:

import re
from subprocess import Popen, PIPE

def get_active_hosts():
    with Popen(['sudo', 'arp-scan', '-l', '-r', '5'], stdout = PIPE) as proc:
        mac_list = re.compile('\s+(([0-9A-Fa-f]{2}:){5}([0-9A-Fa-f]){2})\s+')
        mac_list = mac_list.findall(proc.stdout.read().decode('utf-8'))
    return mac_list
print(get_active_hosts())

But I got this output:

[('4a:c3:26:0e:85:d0', '85:', '0')]

What's going on ? How to capture only mac addresses without this trash:

[('85:', '0')]

Thanks for advice.


Solution

  • findall is returning all of the matching groups that it found. Groups are declared using a set of parentheses. Your regular expression contains three groups as follows:

    (([0-9A-Fa-f]{2}:){5}([0-9A-Fa-f]){2})
    ([0-9A-Fa-f]{2}:)
    ([0-9A-Fa-f])
    

    So now hopefully you understand why findall gives you three matches, and why they look like they do.

    The solution here is to declare these extra groups (the ones you don't want) to be non-capturing by putting ?: after the opening parenthesis as follows:

    mac_list = re.compile('\s+((?:[0-9A-Fa-f]{2}:){5}(?:[0-9A-Fa-f]){2})\s+')