I have unstructured data where I have to extract BP values and the dates(having different formats) as shown below. Right now I have a regex function to extract Bp values. I have a specific case as highlighted in the picture where consecutive dates and even single encountered dates have to be extracted(Not DOB).
Currently, the code I have gives only the BP values. I want the regex function for Bp and extracting the dates as well simultaneously.
I have attached the regex code below.
regex = r'\b(?:BP:?(?:-Sitting)?|Blood Pressure) \d+/\d+(?: \d+/\d+| \d+/\d+)*(?: sm| -Lw| cB| Jr|\
-aA| cs| -ic| ic| -RG| kA| -sL| BL| kc| am| -sH| sH| es| ts| np| 8s| ca| Pm| JE| so| cp| v8| Eu| -cp|\
Pm| EB| Fr| -Fr| -ms| -LN| -mT| -mk| -GF| -HO| Jp| wD| 8m| mc| -mc| Yr| -Lp| -ml| -LA| s/d| -aA| s/d|mmHg| mm Hg|\
mm hg.|.?)?|B/P - (?:Sys|Dias)tolic \d+|(?:Sys|Dias)tolic Blood Pressure \d+ \w+\b'
The image of the current output is given below, in which dates are not included.
Any help with this would be greatly appreciated.
One option is to add matching an optional /
and 1 or more digits in the part where you match the \d+/\d+
.
You can shorten this part \d+/\d+(?: \d+/\d+| \d+/\d+)*
to \d+/\d+(?: ?\d+/\d+)*
as the only difference is matching 1 or 2 spaces in the alternation.
Adding matching an optional forward slash and 1 or more digits in the first part and in the repetition would look like \d+/\d+(?:/\d+)?(?: ?\d+/\d+(?:/\d+)?)*
The updated pattern:
\b(?:BP:?(?:-Sitting)?|Blood Pressure) \d+/\d+(?:/\d+)?(?: ?\d+/\d+(?:/\d+)?)*(?: sm| -Lw| cB| Jr|\
-aA| cs| -ic| ic| -RG| kA| -sL| BL| kc| am| -sH| sH| es| ts| np| 8s| ca| Pm| JE| so| cp| v8| Eu| -cp|\
Pm| EB| Fr| -Fr| -ms| -LN| -mT| -mk| -GF| -HO| Jp| wD| 8m| mc| -mc| Yr| -Lp| -ml| -LA| s/d| -aA| s/d|mmHg| mm Hg|\
mm hg.)?|B/P - (?:Sys|Dias)tolic \d+|(?:Sys|Dias)tolic Blood Pressure \d+ \w+\b
Note that I have omitted the .?
at the end of the alternation, as it would match a trailing whitespace char as well.