Search code examples
pythonregexstringregex-group

Python Regex: How to find a substring


I have a list of titles that I need to normalize. For example, if a title contains 'CTO', it needs to be changed to 'Chief Technology Officer'. However, I only want to replace 'CTO' if there is no letter directly to the left or right of 'CTO'. For example, 'Director' contains 'cto'. I obviously wouldn't want this to be replaced. However, I do want it to be replaced in situations where the title is 'Founder/CTO' or 'CTO/Founder'.

Is there a way to check if a letter is before 'CXO' using regex? Or what would be the best way to accomplish this task?

EDIT: My code is as follows...

test = 'Co-Founder/CTO'
test = re.sub("[^a-zA-Z0-9]CTO", 'Chief Technology Officer', test)

The result is 'Co-FounderChief Technology Officer'. The '/' gets replaced for some reason. However, this doesn't happen if test = 'CTO/Co-Founder'.


Solution

  • Answer: "(?<=[^a-zA-Z0-9])CTO|^CTO"

    Lookbehinds are perfect for this

    cto_re = re.compile("(?<=[^a-zA-Z0-9])CTO")
    

    but unfortunately won't work for the start of lines (due only to the python implementation requiring fixed length).

    for eg in "Co-Founder/CTO", "CTO/Bossy", "aCTOrMan":
        print(cto_re.sub("Chief Technology Officer", eg))
    
    Co-Founder/Chief Technology Officer
    CTO/Bossy
    aCTOrMan
    

    You would have to check for that explicitly via |:

    cto_re = re.compile("(?<=[^a-zA-Z0-9])CTO|^CTO")
    
    for eg in "Co-Founder/CTO", "CTO/Bossy", "aCTOrMan":
        print(cto_re.sub("Chief Technology Officer", eg))
    
    Co-Founder/Chief Technology Officer
    Chief Technology Officer/Bossy
    aCTOrMan