Search code examples
python-3.xregexdata-extraction

Regex function in python with different specifications


I have a column of text data that I am converting to a string using ','.join() function. The data is given as shown below. I want to extract just the BP/Blood Pressure/ Systolic Blood Pressure and its corresponding values from the converted string.

I know I have to use re.findall(). But I am not able to get the values as there are multiple conditions.

I want the BP values of the marked lines of images shown below. Variation of BP

Variation with Blood Pressure

Other BP variation

I want all these variations to be extracted using a regex function.

The code I have for now only gets the first variation. I want to extend this further to get all the variations.

list_items =  file['Text'].tolist()

listToStr = ','.join([str(elem) for elem in list_items])


def get_BP(s):
    #s = s.lower()
    #print(s)
    #regex = r'(BP \d+\/\d+)'
    regex = r'((?:BP|Blood Pressure) \d+\/\d+)'
    try:
        return re.findall(regex,s)
    except:
        pass

x = get_BP(listToStr)
x

The output I want finally is something like this.

['BP 98/60', 'BP 108/60', 'BP 96/60', 'BP 120/75', 'Blood Pressure 106/63', 
 'B/P - Systolic 104','B/P - Diastolic 72','BP-Sitting 109/70 mmH',
 'BP: 101/72','Systolic Blood Pressure 100 mmHg','Diastolic Blood Pressure 68 mmHg']

As I am new to regex functions, any help would be greatly appreciated.

Thank you.


Solution

  • Based on the list with the desired results, you can use an alternation | to specify all variations.

    \b(?:BP:?(?:-Sitting)?|Blood Pressure) \d+/\d+(?: mmHg?)?|B/P - (?:Sys|Dias)tolic \d+|(?:Sys|Dias)tolic Blood Pressure \d+ \w+\b
    

    Regex demo