Search code examples
regexpython-3.xregex-lookaroundsregex-groupregex-greedy

RegEx for matching MAC address or 'N A'


I had a regex that correctly matched a subset of data I was processing. When I ran it against the full data set, it started failing. I noticed some values were 'N A' versus either a MAC address or AP name, depending upon the column.

Sample data:

00:0b:85:57:bc:c0     00:0b:85:57:bc:c1     AP1130         10.10.163.217     Joined
00:1c:0f:81:db:80     00:1c:63:23:ac:a0     AP1140         10.10.163.216     Joined
00:1c:0f:81:fc:20     00:1b:d5:9f:7d:b2     AP1            10.10.163.215     Joined
00:1c:0f:81:fc:20     N A                   N A            10.10.163.215     Not joined
00:21:1b:ea:36:60     00:0c:d4:8a:6b:c1     AP2            10.10.163.214     Joined

Regexp:

((?:(?:[0-9a-f]{2}[:-]){5})(?:[0-9a-f]{2}))(?:\s+?)(((?:(?:[0-9a-f]{2}[:-]){5})(?:[0-9a-f]{2}))|(N A))(?:\s+)((AP.+?)|(N A))(?:\s)

I have modified my regex but it still isn't matching the MAC address or 'N A'. Same for the name field: Match the AP name or 'N A'

My work as it stands: https://regex101.com/r/sgGEzh/1

I assume my brackets are not correct, but I can't see where my (|) OR operator is failing. I am getting duplication of some groups now.

I should match the first MAC address, the second MAC address or the string 'N A', and last the AP name or the string 'N A'.

I should always get three matching groups per line.

I was until I tried to process the 'N A' strings.


Solution

  • You're matching the correct things, you just need to get rid of the unneeded capturing groups around N A and AP.+?. These are causing these strings to end up in different groups in the result. You only need 3 capturing groups.

    You have a number of other groups that aren't really needed, like the non-capturing group around \s+?. You don't need a group around each | alternative if they're already inside a group. The only non-capturing group you need is the one around [0-9a-f]{2}[:-] when it's being quantified.

    The following works and removes all the redundant groups:

    ((?:[0-9a-f]{2}[:-]){5}[0-9a-f]{2})\s+?((?:[0-9a-f]{2}[:-]){5}[0-9a-f]{2}|N A)\s+(AP.+?|N A)\s
    

    DEMO