I am trying to extract data from a PDF file so I read each line of the converted text file into a list. I have a predefined list which will be used as keys. I want to create a dictionary with keys from the predefined list and extract the corresponding value. for example, the file would contain
Name : Luke Cameron
Age and Sex : 37/Male
Haemoglobin 13.0 g/dL
I have got predefined list like
keys = ['Name', 'Age', 'Sex']
My code is
for text in lines:
rx_dict = {elem:re.search(str(elem)+r':\s+\w+.\s\w+',text) for elem in keys}
The output:
{'Patient Name': None,
'Age': None,
'Sex': None
Desired output:
{'Patient Name': Luke Cameron,
'Age': 37,
'Sex': Male
NOTE: This isn't real data and resemblance is just coincidence
You could use
import re
data = """
Name : Luke Cameron
Age and Sex : 37/Male
Haemoglobin 13.0 g/dL"""
rx = re.compile(r'^(?P<key>[^:\n]+):(?P<value>.+)', re.M)
result = {}
for match in rx.finditer(data):
key = match.group('key').rstrip()
value = match.group('value').strip()
key1, key2 = key.split(" and ")
value1, value2 = value.split("/")
result.update({key1: value1, key2: value2})
except ValueError:
result.update({key: value})
Which yields
{'Name': 'Luke Cameron', 'Age': '37', 'Sex': 'Male'}