Search code examples
pythonstringsplitpermutation

python string split by separator all possible permutations


This might be heavily related to similar questions as Python 3.3: Split string and create all combinations , but I can't infer a pythonic solution out of this.

Question is:

Let there be a str such as 'hi|guys|whats|app', and I need all permutations of splitting that str by a separator. Example:

#splitting only once
['hi','guys|whats|app']
['hi|guys','whats|app']
['hi|guys|whats','app']
#splitting only twice
['hi','guys','whats|app']
['hi','guys|whats','app']
#splitting only three times
...
etc

I could write a backtracking algorithm, but does python (itertools, e.g.) offer a library that simplifies this algorithm?

Thanks in advance!!


Solution

  • An approach, once you have split the string is to use itertools.combinations to define the split points in the list, the other positions should be fused again.

    def lst_merge(lst, positions, sep='|'):
        '''merges a list on points other than positions'''
        '''A, B, C, D and 0, 1 -> A, B, C|D'''
        a = -1
        out = []
        for b in list(positions)+[len(lst)-1]:
            out.append('|'.join(lst[a+1:b+1]))
            a = b
        return out
    
    def split_comb(s, split=1, sep='|'):
        from itertools import combinations
        l = s.split(sep)
        return [lst_merge(l, pos, sep=sep)
                for pos in combinations(range(len(l)-1), split)]
    

    examples

    >>> split_comb('hi|guys|whats|app', 0)
    [['hi|guys|whats|app']]
    
    >>> split_comb('hi|guys|whats|app', 1)
    [['hi', 'guys|whats|app'],
     ['hi|guys', 'whats|app'],
     ['hi|guys|whats', 'app']]
    
    >>> split_comb('hi|guys|whats|app', 2)
    [['hi', 'guys', 'whats|app'],
     ['hi', 'guys|whats', 'app'],
     ['hi|guys', 'whats', 'app']]
    
    >>> split_comb('hi|guys|whats|app', 3)
    [['hi', 'guys', 'whats', 'app']]
    
    >>> split_comb('hi|guys|whats|app', 4)
    [] ## impossible
    

    rationale

    ABCD -> A B C D
             0 1 2
    
    combinations of split points: 0/1 or 0/2 or 1/2
    
    0/1 -> merge on 2 -> A B CD
    0/2 -> merge on 1 -> A BC D
    1/2 -> merge on 0 -> AB C D
    

    generic function

    Here is a generic version, working like above but also taking -1 as parameter for split, in which case it will output all combinations

    def lst_merge(lst, positions, sep='|'):
        a = -1
        out = []
        for b in list(positions)+[len(lst)-1]:
            out.append('|'.join(lst[a+1:b+1]))
            a = b
        return out
    
    def split_comb(s, split=1, sep='|'):
        from itertools import combinations, chain
        
        l = s.split(sep)
        
        if split == -1:
            pos = chain.from_iterable(combinations(range(len(l)-1), r)
                                      for r in range(len(l)+1))
        else:
            pos = combinations(range(len(l)-1), split)
            
        return [lst_merge(l, pos, sep=sep)
                for pos in pos]
    

    example:

    >>> split_comb('hi|guys|whats|app', -1)
    [['hi|guys|whats|app'],
     ['hi', 'guys|whats|app'],
     ['hi|guys', 'whats|app'],
     ['hi|guys|whats', 'app'],
     ['hi', 'guys', 'whats|app'],
     ['hi', 'guys|whats', 'app'],
     ['hi|guys', 'whats', 'app'],
     ['hi', 'guys', 'whats', 'app']]