Search code examples
setsubsetpartition

All ways to partition a string


I'm trying to find a efficient algorithm to get all ways to partition a string

eg for a given string 'abcd' =>
'a' 'bcd'
'a' 'b' 'cd'
'a' 'b' 'c' 'd'
'ab' 'cd'
'ab' 'c' 'd'
'abc' 'd'
'a', 'bc', 'd

any language would be appreciated

Thanks in advance !


Solution

  • Problem analysis

    Between each pair of adjacent characters, you can decide whether to cut. For a string of size n, there are n-1 positions where you can cut or not, i.e. there are two possibilities. Therefore a string of size n can be partitioned in 2n-1 ways.

    The output consists of 2n-1 partitions, each having n characters plus separators. So we can describe the output size as f(n) = 2n-1 * n + s(n) where s(n) ≥ 0 accounts for the partition separators and line separators.

    So due to the output size alone, an algorithm solving this problem must have exponential runtime or worse: Ω(2n).

    (0 ≤ c * 2n = ½ * 2n = 2n-1 ≤ 2n-1 * n ≤ f(n) for all n≥k with positive constants c=½, k=1)


    Solution

    I chose to represent a partition as integer. Each bit in cutpoints determines whether to cut between characters i and i+1. To iterate through all possible partitions, we just need to go trough all integers between 0 and 2^(n-1) - 1.

    Example: For a string of length 4, we go through all integers between 0 and 2^3 - 1 or 0 and 7 or in binary: 000 and 111.

    # (python 2 or 3)
    def all_partitions(string):
        for cutpoints in range(1 << (len(string)-1)):
            result = []
            lastcut = 0
            for i in range(len(string)-1):
                if (1<<i) & cutpoints != 0:
                    result.append(string[lastcut:(i+1)])
                    lastcut = i+1
            result.append(string[lastcut:])
            yield result
    
    for partition in all_partitions("abcd"):
        print(partition)
    

    Memory usage:

    I think my solution uses O(n) memory with Python 3. Only one partition is generated at a time, it's printed and not referenced anymore. This changes of course, if you keep all results, e.g. by storing them in a list.

    In Python 2 replace range with xrange, otherwise all possible cutpoints will be stored in a list, therefore needing an exponential amount of memory.


    JavaScript solution

    // ES6 generator
    function* all_partitions(string) {
        for (var cutpoints = 0; cutpoints < (1 << (string.length - 1)); cutpoints++) {
            var result = [];
            var lastcut = 0;
            for (var i = 0; i < string.length - 1; i++) {
                if (((1 << i) & cutpoints) !== 0) {
                    result.push(string.slice(lastcut, i + 1));
                    lastcut = i + 1;
                }
            }
            result.push(string.slice(lastcut));
            yield result;
        }
    }
    
    for (var partition of all_partitions("abcd")) {
        console.log(partition);
    }
    

    Tested with NodeJS v4.4.3 (disclaimer: I have not used NodeJS before).