Search code examples
pythonregexstringregex-grouppython-re

Insert the succeeded word before conjunction 'and' using regex


Input String:

However, the gene of hBD-1 and LL-27 expression was not affected by cancer in both acne and non-acne patients.

Expected Output String:

However, the gene of hBD-1 expression and LL-27 expression was not affected by cancer in both acne patients and non-acne patients.

Code:

import re
string_a = "However, the gene of hBD-1 and LL-27 expression was not affected by cancer in both acne and non-acne patients."
print(string_a)
print('\n')
output = re.sub(r'\b(\w+-(\d+|[A-Za-z]+))\b(?! [A-Za-z]+\b)', r'\b(\1 [A-Za-z]+)\b', string_a)
print(output)

I am not getting the exact output string. Please look into my code and suggest or modify the solution.


Solution

  • I would use re.sub here to selectively replace any gene term with itself followed by the text expression, for those genes who do not already have this text following it.

    inp = "However, the gene of hBD-1 and LL-27 expression was not affected by acnes."
    output = re.sub(r'\b(\w+-\d+)\b(?! expression\b)', r'\1 expression', inp)
    print(output)
    

    This prints:

    However, the gene of hBD-1 expression and LL-27 expression was not affected by acnes.