I want to match different RegEx to a string. For example
from os import listdir
from os.path import isfile, join
import os
import re
s = "rechnungsnr. 234342341"
re_nu = re.compile(r".?rechnung[s]*\s*nr[.]*[:]*\s*(\w*\d+[-.]?\d*)")
rn = re_nu.search(s)
rechnungsnr = (rn.groups())
print(rechnungsnr)
print(rn)
print(rn.group(1))
This yields me the correct group (the number after the text):
('234342341',)
<re.Match object; span=(0, 22), match='rechnungsnr. 234342341'>
234342341
However, if I expand the RegEx with '|' I get different results:
s = "rechnungsnr. 234342341"
re_nu = re.compile(r"rechnungs\s?nummer[:]*\s*(\w*\d+[-.]?\d*)|rechnung(?::*)(?:\s*)((?:\w*)(?:\d+)[-.]?(?:\d*))|.?rechnung[s]*\s*nr[.]*[:]*\s*(\w*\d+[-.]?\d*)| \
belegnummer(?::*)(?:\s*)((?:\w*)(?:\d+)[-.]?(?:\d*))|beleg(?:s*)[-.]?nr(?:.*)(?::*)(?:\s*)((?:\w*)(?:\d+)[-.]?(?:\d*))")
rn = re_nu.search(s)
rechnungsnr = (rn.groups())
print(rechnungsnr)
print(rn)
print(rn.group(1))
As in I get 2 "none" groups before the number I want to extract:
(None, None, '234342341', None, None)
<re.Match object; span=(0, 22), match='rechnungsnr. 234342341'>
None
How can I change the code such that the number is always the first group? The goal of the RegEx is to get the number after the string. The string can be any name for a invoice number (in German). For example the number could come after "rechnungsnummer" but also after "rechnungs nr." but also after "rechnungs nr:" and so on...
Since rn.groups()
is returning a tuple, you can do list comprehension like this:
[item for item in rn.groups() if item is not None]
For example the number could come after "rechnungsnummer" but also after "rechnungs nr." but also after "rechnungs nr:" and so on...
If this is true in all cases, then the list comprehension will only ever return a list with a single element.