Search code examples
pythonpython-re

Regex python find whole line where there are multiple commas


I am trying to set an address pattern for regex to find. Best i could come up with is the number of comas. Here is the part of a text and I want to match the second line:

SIA "JET TRAVEL"
Dzirnieku iela 15, Lidosta ,,Riga”, Marupes novads, LV-1053, Latvija
40003789713

I have tried:

address_pattern = '.*(.*?),+.*\n?'
address = re.findall(address_pattern, ocr_text, flags=re.I|re.M)
for match in address:
    print(match)

Shouldn't it go whole line since it starts with .* and ends with .\n. then (.?) should include all possible values (since there could be words, numbers and whitespaces in between) and comma (,)+ - i need the line to contain multiple commas.

I have also tried ,{3} to indicate I need 3 commas, but it failed.

All and any help would be appriciated.


Solution

  • If you use the re.M(ULTILINE) flag, you don't need to account for the \n. Instead, use ^ and $ to match the beginning and end of lines, respectively (different from their default behavior). re.I has no effect here. Note that {2,} and the like mean "2 or more consecutive commas".

    re.findall('^.*,{2,}.*$', text, re.MULTILINE)