I have unstructured data where I have to extract BP values and the dates(having different formats) as shown below. Right now I have a regex function to extract Bp values and the dates followed by BP values.
I have a specific case as highlighted in the picture where dates are followed by the word 'Recorded' and also has a time stamp.
Also, there is a case where the date occurs before the BP values. I need to extract that date and BP value as well.
Currently, the code I have gives the BP values and the dates that follow the BP values. Now I want this regex along with new cases as shown in the picture to extract all the cases.
I have attached the regex code below.
regex = r'\b(?:BP:?(?:-Sitting)?|Blood Pressure) \d+/\d+(?: \d+/\d+| \d+/\d+)*(?: sm| -Lw| cB| Jr|\
-aA| cs| -ic| ic| -RG| kA| -sL| BL| kc| am| -sH| sH| es| ts| np| 8s| ca| Pm| JE| so| cp| v8| Eu| -cp|\
Pm| EB| Fr| -Fr| -ms| -LN| -mT| -mk| -GF| -HO| Jp| wD| 8m| mc| -mc| Yr| -Lp| -ml| -LA| s/d| -aA| s/d|mmHg| mm Hg|\
mm hg.|.?)?|B/P - (?:Sys|Dias)tolic \d+|(?:Sys|Dias)tolic Blood Pressure \d+ \w+\b'
The image of the current output is given below, in which dates are not included.
I have attached the data in a string format as well if anyone needs the data to be accessed.
Weight: 188 lbs ,Wt 124 Ib (56.2 kg) ,Height: 108.2 cm Weight: 20.9 kg BMI: 18 Lives with Father, Mother. ,Vials BP 120/75 Hu 52" We 202 I (916 kg) BMI 36.95 kg/m 354 2 mi ,W197 Ib 8 oz (44.2 kg) SpO2 99% BMI 19.69 kg/m2 BSA 1.36 m2 ,Weight 316kg ,HT: 160 cm WT: 79.6 kg BMI: 31.09 ,Blood Pressure 106/63 02/27/2019,B/P - Systolic 104,B/P - Diastolic 72,BP-Sitting 109/70 mmHg,BP: 101/72 left arm, normal cuff, seated 123/76 on 09/25/2018,Systolic Blood Pressure 100 mmHg,Diastolic Blood Pressure 68 mmHg,BP 128/80 128/81 128/82 128/83,Pain scale 0 1-10 Oxygen sat % 95 % HR 83 /min BP 144/68 mm Hg Ht , . _ Repeat BP 130/80.Just now feeling she is sure she is feeling FM.Plans to bottlefeed ,Blood Pressure 106/64s/d 78th / 77th percentileqyy Left Arm Sitting ,Blood Pressure 114/76 s/d 77th 7 goth percentileqyyy Right Arm Sitting ,BP 130/82mmHg Pulse 78 Ht 1.753 m 5' 9" Wt 78.019 kg 172 Ib BMI 25.39 kg/m2 ,BP 142/70 sm ,BP 129/87 -Lw ,BP 120/74 cB ,BP 150/80 Jr ,BP 128/80 104/58 120/84 136/78 ,nan,07/20/18 Blood Pressure 112/54 , BP 10/30/17 1345 178/80 ,Name 06/22/2018 06/22/2018 Blood Pressure 120/68 ,4/6/2015 BP 132/69 Patient Position ,nan,nan, Blood Pressure 150/88 Recorded 08Dec2017 02 49PM , Blood Pressure 150/88 Recorded 08Dec2017 02 49PM , Blood Pressure 140/88 Recorded 15Nov2017 03 21PM '
Any help with this would be greatly appreciated.
You might extend the pattern with an alternation matching the specified cases, matching either a date like pattern when Recorded
is at the left, or match a date like pattern when BP
or Blood Pressure
is at the right
(?<=Recorded )\d{1,2}[A-Za-z]{3}\d{4}\b|\d{1,2}/\d{1,2}/\d{2,}\b(?= BP| Blood Pressure)
The updated pattern would look like
\b(?:BP:?(?:-Sitting)?|Blood Pressure) \d+/\d+(?: \d+/\d+| \d+/\d+)*(?: sm| -Lw| cB| Jr|\
-aA| cs| -ic| ic| -RG| kA| -sL| BL| kc| am| -sH| sH| es| ts| np| 8s| ca| Pm| JE| so| cp| v8| Eu| -cp|\
Pm| EB| Fr| -Fr| -ms| -LN| -mT| -mk| -GF| -HO| Jp| wD| 8m| mc| -mc| Yr| -Lp| -ml| -LA| s/d| -aA| s/d|mmHg| mm Hg|\
mm hg.|.?)?|B/P - (?:Sys|Dias)tolic \d+|(?:Sys|Dias)tolic Blood Pressure \d+ \w+\b|(?<=Recorded )\d{1,2}[A-Za-z]{3}\d{4}\b|\d{1,2}/\d{1,2}/\d{2,}\b(?= BP| Blood Pressure)