I want to add |
before every tag.
Please check the below code that I have used.
tags = ['XYZ', 'CREF', 'BREF', 'RREF', 'REF']
string_data = 'XYZ:MUMBAI UNIVERSITYCREF:PUNE UNIVERSITYBREF:DADAR UNIVERSITYRREF:KOLHAPUR UNIVERCITY LLCREF:SOLAPUR UNIVERSITY'
for each_tag in tags:
result = string_data.replace(each_tag, "|" + each_tag)
print(result)
How can I do it using the Regex?
Input String:
XYZ:MUMBAI UNIVERSITYCREF:PUNE UNIVERSITYBREF:DADAR UNIVERSITYRREF:KOLHAPUR UNIVERCITY LLCREF:SOLAPUR UNIVERSITY
Actual result (wrong):
XYZ:MUMBAI UNIVERSITYC|REF:PUNE UNIVERSITYB|REF:DADAR UNIVERSITYR|REF:KOLHAPUR UNIVERCITY LLC|REF:SOLAPUR UNIVERSITY
Expected result:
|XYZ:MUMBAI UNIVERSITY|CREF:PUNE UNIVERSITY|BREF:DADAR UNIVERSITY|RREF:KOLHAPUR UNIVERCITY LLC|REF:SOLAPUR UNIVERSITY
Is there any way to do it using regex?
You could match an optional B
or R
or match a C
when not preceded with an L
using a negative lookbehind.
(?:[BR]?|(?<!L)C)REF|^(?!\|)
Explanation
(?:
Non capture group
[BR]?
Match an optional B
or R
|
Or(?<!L)C
Match a C
and assert what is directly to the left is not L
)
Close groupREF
Match literally|
Or^(?!\|)
Assert the start of the string when not directly followed by a |
to prevent starting with a double ||
if there already is one presentIn the replacement use the match prepended with a pipe
|\g<0>
For example
import re
regex = r"(?:[BR]?|(?<!L)C)REF|^(?!\|)"
test_str = "XYZ:MUMBAI UNIVERSITYCREF:PUNE UNIVERSITYBREF:DADAR UNIVERSITYRREF:KOLHAPUR UNIVERCITY LLCREF:SOLAPUR UNIVERSITY"
subst = "|\\g<0>"
result = re.sub(regex, subst, test_str)
print (result)
Output
|XYZ:MUMBAI UNIVERSITY|CREF:PUNE UNIVERSITY|BREF:DADAR UNIVERSITY|RREF:KOLHAPUR UNIVERCITY LLC|REF:SOLAPUR UNIVERSITY