Search code examples
pythonstringrecursionsplitprefix

Recursively split strings that contains a defined set of prefixes - Python


If i have a list of prefix that can be attached to a string, how do i split a string such into it's prefix and the other characters in the next substring. For example:

prefixes = ['over','under','re','un','co']

str1 = "overachieve"
output: ["over","achieve"]

str2 = "reundo"
output = ["re","un","do"]

Is there a better way to do the above task, maybe with regex or some string functions other than:

str1 = "reundo"
output = []

for x in [p for p in prefixes if p in str1]:
    output.append(x)    
    str1 =  str1.replace(x,"",1)
output.append(str1)

Solution

  • Regular expressions are an efficient way to search for many alternative prefixes:

    import re
    
    def split_prefixes(word, prefixes):
        regex = re.compile('|'.join(sorted(prefixes, key=len, reverse=True)))
        result = []
        i = 0
        while True:
            mo = regex.match(word, i)
            if mo is None:
                result.append(word[i:])
                return result
            result.append(mo.group())
            i = mo.end()
    
    
    >>> prefixes = ['over', 'under', 're', 'un', 'co']
    >>> for word in ['overachieve', 'reundo', 'empire', 'coprocessor']:
            print word, '-->', split_prefixes(word, prefixes)
    
    overachieve --> ['over', 'achieve']
    reundo --> ['re', 'un', 'do']
    empire --> ['empire']
    coprocessor --> ['co', 'processor']