How to make Dict of Ngram of my dataframe start with some string Python

I have dataframe like this

id  name        cat     subcat
-------------------------------
1   aa bb cc    A       a-a
2   bb cc dd    B       b-a
3   aa bb ee    C       c-a
4   aa gg cc    D       d-a

I want to make dict of this dataframe Which includes the most Ngram of two words like this

aa bb : 2
bb cc : 2
cc dd : 1
bb ee : 1
aa gg : 1
gg cc : 1

Solution

Update using pairwise recipe from itertools

from itertools import combinations, chain

def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return zip(a, b)

pd.Series(chain(*df['name'].str.split(' ')
                           .apply(lambda x: pairwise(x))))\
  .value_counts()

Output:

(aa, bb)    2
(bb, cc)    2
(cc, dd)    1
(bb, ee)    1
(aa, gg)    1
(gg, cc)    1
dtype: int64

IIUC, you can try something like this:

from itertools import combinations, chain

pd.Series(list(chain(*df['name'].str.split(' ')
                                .apply(lambda x: list(combinations(x, 2))))))\
  .value_counts()

Output:

(aa, bb)    2
(aa, cc)    2
(bb, cc)    2
(bb, dd)    1
(cc, dd)    1
(aa, ee)    1
(bb, ee)    1
(aa, gg)    1
(gg, cc)    1
dtype: int64