looking for an efficient way to split columns in a text in pandas

I have pandas dataframe and want to split the text column in such a way that each row has just two words. when splitting, I need to maintain the order so that I can combine them together based on line. Is there efficient way to do this. I can do list comprehension but was looking at more efficient way. Thanks

df = pd.DataFrame({'col1':[22,23,44], 'col2': ['rr','gg','xx'], 'text': ['this is a sample text', 'this is another one','third example is a longer text']})

Solution

Using str.findall, explode, and groupby.cumcount:

out = (df.assign(text=df['text'].str.findall(r'(\S+(?:\s+\S+)?)'))
         .explode('text')
         .assign(line=lambda d: d.groupby(level=0).cumcount())
       )

Regexes variant to handle any number of words:

N=2
out = (df.assign(text=df['text'].str.findall(fr'((?:\S+\s+){{,{N-1}}}(?:\S+))\s*'))
         .explode('text')
         .assign(line=lambda d: d.groupby(level=0).cumcount())
       )

Alternative with itertools' batched recipe:

from itertools import islice

def batched(iterable, n):
    "Batch data into tuples of length n. The last batch may be shorter."
    # batched('ABCDEFG', 3) --> ABC DEF G
    if n < 1:
        raise ValueError('n must be at least one')
    it = iter(iterable)
    while batch := tuple(islice(it, n)):
        yield batch

N = 2
out = (df.assign(text=df['text'].map(lambda x: list(map(' '.join, batched(x.split(), N)))))
         .explode('text')
         .assign(line=lambda d: d.groupby(level=0).cumcount())
       )

Output:

   col1 col2           text  line
0    22   rr        this is     0
0    22   rr       a sample     1
0    22   rr           text     2
1    23   gg        this is     0
1    23   gg    another one     1
2    44   xx  third example     0
2    44   xx           is a     1
2    44   xx    longer text     2

Example output with N=4:

   col1 col2                 text  line
0    22   rr     this is a sample     0
0    22   rr                 text     1
1    23   gg  this is another one     0
2    44   xx   third example is a     0
2    44   xx          longer text     1