That is, for a sentence break it up into all possible combinations of in-order words, with no words omitted
For example, for input "The cat sat on the mat"
output
[("The", "cat sat on the mat"),
("The cat", "sat on the mat"),
("The cat", "sat", "on the mat")] #etc
but not
("The mat", "cat sat on the") # out of order
("The cat"), ("mat") # words missing
I looked at methods in itertools, but can't see that they do the job, as combinations will miss items out ("the cat", "mat") and permutations will change the order.
Am I missing something in these tools, or are they just not right ones?
(For the sake of clarity, this is not a question about how to split a string, but how to get combinations)
Modifying Raymond Hettinger's partition recipe for Python 3 as inspired by this blog post from WordAligned, and every partition case with your list, we can accomplish this with chain
and combinations
from itertools.
from itertools import chain, combinations
def partition(iterable):
n = len(input_list)
b, mid, e = [0], list(range(1, n)), [n]
getslice = input_list.__getitem__
splits = (d for i in range(n) for d in combinations(mid, i))
return [[input_list[sl] for sl in map(slice, chain(b, d), chain(d, e))]
for d in splits]
Demo:
>>> print(partition(input_list))
[[['The', 'cat', 'sat', 'on', 'the', 'mat']], [['The'], ['cat', 'sat', 'on', 'the', 'mat']], [['The', 'cat'], ['sat', 'on', 'the', 'mat']], [['The', 'cat', 'sat'], ['on', 'the', 'mat']], [['The', 'cat', 'sat', 'on'], ['the', 'mat']], [['The', 'cat', 'sat', 'on', 'the'], ['mat']], [['The'], ['cat'], ['sat', 'on', 'the', 'mat']], [['The'], ['cat', 'sat'], ['on', 'the', 'mat']], [['The'], ['cat', 'sat', 'on'], ['the', 'mat']], [['The'], ['cat', 'sat', 'on', 'the'], ['mat']], [['The', 'cat'], ['sat'], ['on', 'the', 'mat']], [['The', 'cat'], ['sat', 'on'], ['the', 'mat']], [['The', 'cat'], ['sat', 'on', 'the'], ['mat']], [['The', 'cat', 'sat'], ['on'], ['the', 'mat']], [['The', 'cat', 'sat'], ['on', 'the'], ['mat']], [['The', 'cat', 'sat', 'on'], ['the'], ['mat']], [['The'], ['cat'], ['sat'], ['on', 'the', 'mat']], [['The'], ['cat'], ['sat', 'on'], ['the', 'mat']], [['The'], ['cat'], ['sat', 'on', 'the'], ['mat']], [['The'], ['cat', 'sat'], ['on'], ['the', 'mat']], [['The'], ['cat', 'sat'], ['on', 'the'], ['mat']], [['The'], ['cat', 'sat', 'on'], ['the'], ['mat']], [['The', 'cat'], ['sat'], ['on'], ['the', 'mat']], [['The', 'cat'], ['sat'], ['on', 'the'], ['mat']], [['The', 'cat'], ['sat', 'on'], ['the'], ['mat']], [['The', 'cat', 'sat'], ['on'], ['the'], ['mat']], [['The'], ['cat'], ['sat'], ['on'], ['the', 'mat']], [['The'], ['cat'], ['sat'], ['on', 'the'], ['mat']], [['The'], ['cat'], ['sat', 'on'], ['the'], ['mat']], [['The'], ['cat', 'sat'], ['on'], ['the'], ['mat']], [['The', 'cat'], ['sat'], ['on'], ['the'], ['mat']], [['The'], ['cat'], ['sat'], ['on'], ['the'], ['mat']]]