Search code examples
pythonregexpython-3.xdata-extraction

Extracting required names from a given format


I have a text file containing data as shown below. I have to extract some required names from it. I am trying the below code but not getting the required results.

The file contains data as below:

Leader :     Tim Lee ; 34567
Head\Organiser: Sam Mathews; 11:53 am
Head: Alica Mills; 45612
Head\Secretary: Maya Hill; #53190
Captain- Jocey David # 45123
Vice Captain:- Jacob Green;  -65432

The code which I am trying:

import re
pattern = re.compile(r'(Leader|Head\\Organiser|Captain|Vice Captain).*(\w+)',re.I)
matches=pattern.findall(line)
for match in matches:
    print(match)

Expected Output:

Tim Lee
Sam Mathews
Jocey David
Jacob Green

Solution

  • import re
    line = '''
    Leader :     Tim Lee ; 34567
    Head\Organiser: Sam Mathews; 11:53 am
    Head: Alica Mills; 45612
    Head\Secretary: Maya Hill; #53190
    Captain- Jocey David # 45123
    Vice Captain:- Jacob Green;  -65432'''
    pattern = re.compile(r'(?:Leader|Head(?:\\Organiser|\\Secretary)?|Captain|Vice Captain)\W+(\w+(?:\s+\w+)?)',re.I)
    matches=pattern.findall(line)
    for match in matches:
        print(match)
    

    Explanation:

    (?:                 : start non capture group
      Leader            : literally
     |                  : OR
      Head              : literally
      (?:               : start non capture group
        \\Organiser     : literally
       |                : OR
        \\Secretary     : literally
      )?                ! end group, optional
     |                  : OR
      Captain           : literally
     |                  : OR
      Vice Captain      : literally
    )                   : end group
    \W+                 : 1 or more non word character
    (                   : start group 1
      \w+               : 1 or more word char
      (?:               : non capture group
        \s+             : 1 or more spaces
        \w+             : 1 or more word char
      )?                : end group, optional
    )                   : end group 1
    

    Result for given example:

    Tim Lee
    Sam Mathews
    Alica Mills
    Maya Hill
    Jocey David
    Jacob Green