Search code examples
pythonstrip

strip remove '_' unexpectedly


>>> x = 'abc_cde_fgh'
>>> x.strip('abc_cde')
'fgh'

_fgh is expected.

How to understard this result?


Solution

  • Strip removes any characters it finds from either end from the substring: it doesn't remove a trailing or leading word.

    This example demonstrates it nicely:

    x.strip('ab_ch')
    'de_fg'
    

    Since the characters "a", "b", "c", "h", and "_" are in the remove case, the leading "abc_c" are all removed. The other characters are not removed.

    If you would like to remove a leading or trailing word, I would recommend using re or startswith/endswith.

    def rstrip_word(str, word):
        if str.endswith(word):
            return str[:-len(word)]
        return str
    
    def lstrip_word(str, word):
        if str.startswith(word):
            return str[len(word):]
        return str
    
    def strip_word(str, word):
        return rstrip_word(lstrip_word(str, word), word)
    

    Removing Multiple Words

    A very simple implementation (a greedy one) to remove multiple words from a string can be done as follows:

    def rstrip_word(str, *words):
        for word in words:
            if str.endswith(word):
                return str[:-len(word)]
        return str
    
    def lstrip_word(str, *words):
        for word in words:
            if str.startswith(word):
                return str[len(word):]
        return str
    
    def strip_word(str, *words):
        return rstrip_word(lstrip_word(str, *words), *words)
    

    Please note this algorithm is greedy, it will find the first possible example and then return: it may not behave as you expect. Finding the maximum length match (although not too tricky) is a bit more involved.

    >>> strip_word(x, "abc", "adc_")
    '_cde_fgh'