Search code examples
pythonregexstringregex-grouppython-re

Inserting the succeeded word before 'and' conjunction using regex in python for various cases


The problem for different cases using examples of string.

Case 1

Input string:

However, the gene of hBD-1 and LL-27 expression was not affected by acnes.

Code:

import re
str_a = "However, the gene of hBD-1 and LL-27 expression was not affected by acnes."
out_a = re.sub(r'\b(\w+-\d+)\b(?! expression\b)', r'\1 expression', str_a)
print(out_a)

Output string:

However, the gene of hBD-1 expression and LL-27 expression was not affected by acnes.

Case 2

Input string:

The gene of acne and non-acne patients was affected by cancer.

Code:

import re
str_b = "The gene of acne and non-acne patients was affected by cancer."
out_b = re.sub(r'\b(acne)\b(?! patients\b)', r'\1 patients', str_b)
print(out_b)

Output string:

The gene of acne patients and non-acne patients was affected by cancer.

Case 3

Input string:

Since, the gene of hBD-1 and LL-27 expression was not affected by acnes therefore the gene of acne and non-acne patients was affected by cancer.

Expected output string:

Since, the gene of hBD-1 expression and LL-27 expression was not affected by acnes therefore the gene of acne patients and non-acne patients was affected by cancer.

What I need:

How to make these two regex for generic cases? I have to execute two different regex for two different strings. In case 3 how would I combine both the cases regex into a single regex. Kindly modify the regex or provide any other better solution.


Solution

  • What you might so is use 2 capture groups and use sub with a lambda checking for the groups.

    import re
    
    regex = r"\b(\w+-\d+)\b(?! expression\b)|\b(acne)\b(?! patients\b)"
    
    s = ("However, the gene of hBD-1 and LL-27 expression was not affected by acnes.\n\n"
                "The gene of acne and non-acne patients was affected by cancer.\n\n"
                "Since, the gene of hBD-1 and LL-27 expression was not affected by acnes therefore the gene of acne and non-acne patients was affected by cancer.")
    
    result = re.sub(regex, lambda x: x.group(1) + " expression" if x.group(1) else x.group(2) + " patients", s)
    print(result)
    

    Output

    However, the gene of hBD-1 expression and LL-27 expression was not affected by acnes.
    
    The gene of acne patients and non-acne patients was affected by cancer.
    
    Since, the gene of hBD-1 expression and LL-27 expression was not affected by acnes therefore the gene of acne patients and non-acne patients was affected by cancer.