I have a list of titles that I need to normalize. For example, if a title contains 'CTO', it needs to be changed to 'Chief Technology Officer'. However, I only want to replace 'CTO' if there is no letter directly to the left or right of 'CTO'. For example, 'Director' contains 'cto'. I obviously wouldn't want this to be replaced. However, I do want it to be replaced in situations where the title is 'Founder/CTO' or 'CTO/Founder'.
Is there a way to check if a letter is before 'CXO' using regex? Or what would be the best way to accomplish this task?
EDIT: My code is as follows...
test = 'Co-Founder/CTO'
test = re.sub("[^a-zA-Z0-9]CTO", 'Chief Technology Officer', test)
The result is 'Co-FounderChief Technology Officer'. The '/' gets replaced for some reason. However, this doesn't happen if test = 'CTO/Co-Founder'.
Answer: "(?<=[^a-zA-Z0-9])CTO|^CTO"
Lookbehinds are perfect for this
cto_re = re.compile("(?<=[^a-zA-Z0-9])CTO")
but unfortunately won't work for the start of lines (due only to the python implementation requiring fixed length).
for eg in "Co-Founder/CTO", "CTO/Bossy", "aCTOrMan":
print(cto_re.sub("Chief Technology Officer", eg))
Co-Founder/Chief Technology Officer
CTO/Bossy
aCTOrMan
You would have to check for that explicitly via |
:
cto_re = re.compile("(?<=[^a-zA-Z0-9])CTO|^CTO")
for eg in "Co-Founder/CTO", "CTO/Bossy", "aCTOrMan":
print(cto_re.sub("Chief Technology Officer", eg))
Co-Founder/Chief Technology Officer
Chief Technology Officer/Bossy
aCTOrMan