Search code examples
regexstandardization

RegEx question: standardization of medical terms


I need to detect words as 'bot/hersen/levermetastase' and transform them into 'botmetastase, hersenmetastase, levermetastase'. But also 'lever/botmetastase' into 'levermetastase, botmetastase'.

So I need to be sure the "word/word/word metastase" is as variabele as possible in numbers.

This is my solution but it doesn't work.

FILTERIN:

\b(\w)\s*[\/]\s*(\w)\s*(metastase)\b 

FILTEROUT:

$1metastase, $2metastase, $3metastase

Solution

  • You may use

    /?(\w+)(?=(?:/\w+)+metastase\b)/?
    

    Replace with $1metastase (with space at the end).

    If there can be spaces around the slashes, use

    /?\s*(\w+)(?=(?:\s*/\s*\w+)+metastase\b)(?:\s*/)?
    /?\h*(\w+)(?=(?:\h*/\h*\w+)+metastase\b)(?:\h*/)?
    

    where \h matches a horizontal only whitespace char, and \s will match any whitespace char.

    See the regex demo #1 and regex demo #2.

    Details

    • /? - an optional / char
    • (\w+) - Group 1: one or more word chars
    • (?=(?:/\w+)+metastase\b) - that must be followed with
      • (?:/\w+)+ - one or more occurrences of / and then 1+ word chars
      • metastase\b - and metastase whole word (\b is a word boundary)
    • /? - an optional / char.