Search code examples
pythonpython-re

How can I group RegEx correctly with '|'?


I want to match different RegEx to a string. For example

from os import listdir
from os.path import isfile, join
import os
import re

s = "rechnungsnr. 234342341"

re_nu = re.compile(r".?rechnung[s]*\s*nr[.]*[:]*\s*(\w*\d+[-.]?\d*)")

rn = re_nu.search(s)
rechnungsnr = (rn.groups())
print(rechnungsnr)
print(rn)
print(rn.group(1))

This yields me the correct group (the number after the text):

('234342341',)
<re.Match object; span=(0, 22), match='rechnungsnr. 234342341'>
234342341

However, if I expand the RegEx with '|' I get different results:

s = "rechnungsnr. 234342341"

re_nu = re.compile(r"rechnungs\s?nummer[:]*\s*(\w*\d+[-.]?\d*)|rechnung(?::*)(?:\s*)((?:\w*)(?:\d+)[-.]?(?:\d*))|.?rechnung[s]*\s*nr[.]*[:]*\s*(\w*\d+[-.]?\d*)|   \
                    belegnummer(?::*)(?:\s*)((?:\w*)(?:\d+)[-.]?(?:\d*))|beleg(?:s*)[-.]?nr(?:.*)(?::*)(?:\s*)((?:\w*)(?:\d+)[-.]?(?:\d*))")

rn = re_nu.search(s)
rechnungsnr = (rn.groups())
print(rechnungsnr)
print(rn)
print(rn.group(1))

As in I get 2 "none" groups before the number I want to extract:

(None, None, '234342341', None, None)
<re.Match object; span=(0, 22), match='rechnungsnr. 234342341'>
None

How can I change the code such that the number is always the first group? The goal of the RegEx is to get the number after the string. The string can be any name for a invoice number (in German). For example the number could come after "rechnungsnummer" but also after "rechnungs nr." but also after "rechnungs nr:" and so on...


Solution

  • Since rn.groups() is returning a tuple, you can do list comprehension like this:

    [item for item in rn.groups() if item is not None]
    

    For example the number could come after "rechnungsnummer" but also after "rechnungs nr." but also after "rechnungs nr:" and so on...

    If this is true in all cases, then the list comprehension will only ever return a list with a single element.