I am trying to set an address pattern for regex to find. Best i could come up with is the number of comas. Here is the part of a text and I want to match the second line:
SIA "JET TRAVEL"
Dzirnieku iela 15, Lidosta ,,Riga”, Marupes novads, LV-1053, Latvija
40003789713
I have tried:
address_pattern = '.*(.*?),+.*\n?'
address = re.findall(address_pattern, ocr_text, flags=re.I|re.M)
for match in address:
print(match)
Shouldn't it go whole line since it starts with .* and ends with .\n. then (.?) should include all possible values (since there could be words, numbers and whitespaces in between) and comma (,)+ - i need the line to contain multiple commas.
I have also tried ,{3} to indicate I need 3 commas, but it failed.
All and any help would be appriciated.
If you use the re.M(ULTILINE)
flag, you don't need to account for the \n
. Instead, use ^
and $
to match the beginning and end of lines, respectively (different from their default behavior). re.I
has no effect here. Note that {2,}
and the like mean "2 or more consecutive commas".
re.findall('^.*,{2,}.*$', text, re.MULTILINE)