Search code examples
pythonregexdata-extraction

Insert values in dictionary using regex which includes key in the pattern


I am trying to extract data from a PDF file so I read each line of the converted text file into a list. I have a predefined list which will be used as keys. I want to create a dictionary with keys from the predefined list and extract the corresponding value. for example, the file would contain

Name  : Luke Cameron 
Age and Sex : 37/Male
Haemoglobin       13.0            g/dL

I have got predefined list like keys = ['Name', 'Age', 'Sex']

My code is

for text in lines:
    rx_dict = {elem:re.search(str(elem)+r':\s+\w+.\s\w+',text) for elem in keys}

The output:

{'Patient Name': None,
 'Age': None,
 'Sex': None
}

Desired output:

{'Patient Name': Luke Cameron,
 'Age': 37,
 'Sex': Male
}

NOTE: This isn't real data and resemblance is just coincidence


Solution

  • You could use

    import re
    
    data = """
    Name  : Luke Cameron 
    Age and Sex : 37/Male
    Haemoglobin       13.0            g/dL"""
    
    rx = re.compile(r'^(?P<key>[^:\n]+):(?P<value>.+)', re.M)
    
    result = {}
    for match in rx.finditer(data):
        key = match.group('key').rstrip()
        value = match.group('value').strip()
        try:
            key1, key2 = key.split(" and ")
            value1, value2 = value.split("/")
            result.update({key1: value1, key2: value2})
        except ValueError:
            result.update({key: value})
    
    print(result)
    

    Which yields

    {'Name': 'Luke Cameron', 'Age': '37', 'Sex': 'Male'}