How do i get the names from the line like below, using regex ??
line #1==>
Elector's Name: Surpam Badurubai Elector's Name: Madavimaru Elector's Name: Madavitannubai
line #2==>
Elector's Name: GEDAM KARNU Elector's Name: GEDAM BHEEM BAI Elector's Name: Surpam Rajeshwar Rav
I've tried
regex = "\s*Elector\'s\sName\:\s([[a-zA-z]*\s[a-zA-z]*\s*[a-zA-z]*]*)\s"
re.findall(regex, line)
It was working for line 1 but is not able to fetch the last name. For line 2, it only fetched 'Surpam Rajeshwar' from the last name but it actually has 3 words in it.
I Appreciate, if someone could help me with this or suggest me a different way to get the names. !!
You may do that without a regex by splitting with Elector's Name:
, stripping the resulting items from whitespace and dropping all empty items:
ss = ["Elector's Name: Surpam Badurubai Elector's Name: Madavimaru Elector's Name: Madavitannubai",
"Elector's Name: GEDAM KARNU Elector's Name: GEDAM BHEEM BAI Elector's Name: Surpam Rajeshwar Rav"]
for s in ss:
print(filter(None, [x.strip() for x in s.split("Elector's Name:")]))
See a Python demo, output:
['Surpam Badurubai', 'Madavimaru', 'Madavitannubai']
['GEDAM KARNU', 'GEDAM BHEEM BAI', 'Surpam Rajeshwar Rav']
Just in case you want to study regex, here is a possible regex based solution:
re.findall(r"Elector's Name:\s*(.*?)(?=\s*Elector's Name:|$)", s)
Pattern details
Elector's Name:
- a literal substring\s*
- 0+ whitespaces(.*?)
- Group 1 (this value is returned by re.findall
): any 0+ chars other than line break chars (with re.DOTALL
, including them) as few as possible(?=\s*Elector's Name:|$)
- a positive lookahead that requires 0+ whitespaces and Elector's Name:
after them or the end of string ($
) immediately to the right of the current location.