Search code examples
pythonregexpython-3.xregex-group

Error in tag separated by `|` using Regex python


I want to add | before every tag. Please check the below code that I have used.

tags = ['XYZ', 'CREF', 'BREF', 'RREF', 'REF']

string_data = 'XYZ:MUMBAI UNIVERSITYCREF:PUNE UNIVERSITYBREF:DADAR UNIVERSITYRREF:KOLHAPUR UNIVERCITY LLCREF:SOLAPUR UNIVERSITY'

for each_tag in tags:
    result = string_data.replace(each_tag, "|" + each_tag)
    print(result)

How can I do it using the Regex?

Input String:

XYZ:MUMBAI UNIVERSITYCREF:PUNE UNIVERSITYBREF:DADAR UNIVERSITYRREF:KOLHAPUR UNIVERCITY LLCREF:SOLAPUR UNIVERSITY

Actual result (wrong):

XYZ:MUMBAI UNIVERSITYC|REF:PUNE UNIVERSITYB|REF:DADAR UNIVERSITYR|REF:KOLHAPUR UNIVERCITY LLC|REF:SOLAPUR UNIVERSITY

Expected result:

|XYZ:MUMBAI UNIVERSITY|CREF:PUNE UNIVERSITY|BREF:DADAR UNIVERSITY|RREF:KOLHAPUR UNIVERCITY LLC|REF:SOLAPUR UNIVERSITY

Is there any way to do it using regex?


Solution

  • You could match an optional B or R or match a C when not preceded with an L using a negative lookbehind.

    (?:[BR]?|(?<!L)C)REF|^(?!\|)
    

    Explanation

    • (?: Non capture group
      • [BR]? Match an optional B or R
      • | Or
      • (?<!L)C Match a C and assert what is directly to the left is not L
    • ) Close group
    • REF Match literally
    • | Or
    • ^(?!\|) Assert the start of the string when not directly followed by a | to prevent starting with a double || if there already is one present

    Regex demo | Python demo

    In the replacement use the match prepended with a pipe

    |\g<0>
    

    For example

    import re
    
    regex = r"(?:[BR]?|(?<!L)C)REF|^(?!\|)"
    test_str = "XYZ:MUMBAI UNIVERSITYCREF:PUNE UNIVERSITYBREF:DADAR UNIVERSITYRREF:KOLHAPUR UNIVERCITY LLCREF:SOLAPUR UNIVERSITY"
    subst = "|\\g<0>"
    result = re.sub(regex, subst, test_str)
    
    print (result)
    

    Output

    |XYZ:MUMBAI UNIVERSITY|CREF:PUNE UNIVERSITY|BREF:DADAR UNIVERSITY|RREF:KOLHAPUR UNIVERCITY LLC|REF:SOLAPUR UNIVERSITY