I have a column of text data that I am converting to a string using ','.join() function. The data is given as shown below. I want to extract just the BP/Blood Pressure/ Systolic Blood Pressure and its corresponding values from the converted string.
I know I have to use re.findall(). But I am not able to get the values as there are multiple conditions.
I want the BP values of the marked lines of images shown below.
I want all these variations to be extracted using a regex function.
The code I have for now only gets the first variation. I want to extend this further to get all the variations.
list_items = file['Text'].tolist()
listToStr = ','.join([str(elem) for elem in list_items])
def get_BP(s):
#s = s.lower()
#print(s)
#regex = r'(BP \d+\/\d+)'
regex = r'((?:BP|Blood Pressure) \d+\/\d+)'
try:
return re.findall(regex,s)
except:
pass
x = get_BP(listToStr)
x
The output I want finally is something like this.
['BP 98/60', 'BP 108/60', 'BP 96/60', 'BP 120/75', 'Blood Pressure 106/63',
'B/P - Systolic 104','B/P - Diastolic 72','BP-Sitting 109/70 mmH',
'BP: 101/72','Systolic Blood Pressure 100 mmHg','Diastolic Blood Pressure 68 mmHg']
As I am new to regex functions, any help would be greatly appreciated.
Thank you.
Based on the list with the desired results, you can use an alternation |
to specify all variations.
\b(?:BP:?(?:-Sitting)?|Blood Pressure) \d+/\d+(?: mmHg?)?|B/P - (?:Sys|Dias)tolic \d+|(?:Sys|Dias)tolic Blood Pressure \d+ \w+\b