Search code examples
pythonnlpnltkpython-itertools

Get all pairs of right-branching words from a sentence


Given that I have a string like:

 'velvet evening purse bags'

how can I get all word pairs of this? In other words, all 2-word combinations of this:

'velvet evening'
'velvet purse'
'velvet bags'
'evening purse'
'evening bags'
'purse bags'

I know python's nltk package can give the bigrams but I'm looking for something beyond that functionality. Or do I have to write my own custom function in Python?


Solution

  • You can use itertools.combinations for this:

    s = 'velvet evening purse bags'
    
    from nltk import word_tokenize
    
    words = word_tokenize(s)
    
    from itertools import combinations
    
    pairs = [' '.join(comb) for comb in combinations(words, 2)]
    
    print(pairs)
    

    Output:

    ['velvet evening', 'velvet purse', 'velvet bags', 'evening purse', 'evening bags', 'purse bags']