Search code examples
pythonregexpython-re

Regex group doesn't match with "?" even if it should


input strings:

"| VLAN56                    | LAB06    | Labor 06                          | 56     | 172.16.56.0/24   | VLAN56_LAB06       | ✔️           |            |",
"| VLAN57                    | LAB07    | Labor 07                          | 57     | 172.16.57.0/24   | VLAN57_LAB07       | ✔️           | @#848484:  |"

regex:

'\|\s+(\d+).+(VLAN\d+_[0-9A-Za-z]+)\s+\|.+(#[0-9A-Fa-f]{6})?'

The goal is to get the VLAN number, hostname, and if there is one, the color code, but with a "?" it ignores the color code every time, even when it should match.

With the "?" the last capture group is always None.


Solution

  • You may use this regex:

    \|\s+(\d+).+(VLAN\d+_[0-9A-Za-z]+)\s+\|[^|]+\|[^#|]*(#[0-9A-Fa-f]{6})?
    

    You have a demo here: https://regex101.com/r/SWe42v/1

    The reason why it didn't work with your regex is that .+ is a greedy quantifier: It matches as much as it can.

    So, when you added the ? to the last part of the regex, you give no option to backtrack. The .+ matches the rest of the string/line and the group captures nothing (which is correct because it is optional)

    In order to fix it, you can simply try to match the column with the emoji. You don't care about its content, so you simply use |[^|]+to skip the column.

    This sort of construct is widely used in regexes: SEPARATOR[^SEPARATOR]*