Search code examples
pythonregexpython-2.7python-3.xnames

python regular expressions - how to get all the names in a line?


How do i get the names from the line like below, using regex ??

line #1==> 
Elector's Name: Surpam Badurubai Elector's Name: Madavimaru Elector's Name: Madavitannubai 

line #2==>
Elector's Name: GEDAM KARNU Elector's Name: GEDAM BHEEM BAI Elector's Name: Surpam Rajeshwar Rav

I've tried

regex = "\s*Elector\'s\sName\:\s([[a-zA-z]*\s[a-zA-z]*\s*[a-zA-z]*]*)\s" 
re.findall(regex, line)

It was working for line 1 but is not able to fetch the last name. For line 2, it only fetched 'Surpam Rajeshwar' from the last name but it actually has 3 words in it.

I Appreciate, if someone could help me with this or suggest me a different way to get the names. !!


Solution

  • You may do that without a regex by splitting with Elector's Name:, stripping the resulting items from whitespace and dropping all empty items:

    ss = ["Elector's Name: Surpam Badurubai Elector's Name: Madavimaru Elector's Name: Madavitannubai",
       "Elector's Name: GEDAM KARNU Elector's Name: GEDAM BHEEM BAI Elector's Name: Surpam Rajeshwar Rav"]
    for s in ss:
        print(filter(None, [x.strip() for x in s.split("Elector's Name:")]))
    

    See a Python demo, output:

    ['Surpam Badurubai', 'Madavimaru', 'Madavitannubai']
    ['GEDAM KARNU', 'GEDAM BHEEM BAI', 'Surpam Rajeshwar Rav']
    

    Just in case you want to study regex, here is a possible regex based solution:

    re.findall(r"Elector's Name:\s*(.*?)(?=\s*Elector's Name:|$)", s) 
    

    See another Python demo

    Pattern details

    • Elector's Name: - a literal substring
    • \s* - 0+ whitespaces
    • (.*?) - Group 1 (this value is returned by re.findall): any 0+ chars other than line break chars (with re.DOTALL, including them) as few as possible
    • (?=\s*Elector's Name:|$) - a positive lookahead that requires 0+ whitespaces and Elector's Name: after them or the end of string ($) immediately to the right of the current location.