Search code examples
pythonregexpattern-matchingfrequencyword-count

Calculate the frequency of all the combination of 2 words in Python


I have a paragraph of text. I want to calculate all the possible combination of 2 words (2 words have to be next to each other) For example:

"I have 2 laptops, I have 2 chargers"

The result should be:

"I have": 2
"have 2": 2
"2 laptops": 1
"Laptops, I": (Dont count)
"2 chargers": 1

I tried Regex but the thing is that it doesnt count a string twice

I used: \b[a-z]{1,20}\b \b[a-z]{1,20}\b

Text: cold chain, energy storage device, industrial cooling system

It works almost but it doesn't include words such as "storage device", cooling system because it already takes energy storage and industrial cooling

Appreciate your advice


Solution

  • You can use zip to get groups of every two words and then use Counter to get the frequency

    >>> from collections import Counter
    >>> text = "I have 2 laptops, I have 2 chargers"
    >>> words = text.split()
    
    >>> d = {' '.join(words):n for words,n in Counter(zip(words, words[1:])).items() if not  words[0][-1]==(',')}
    >>> print (d)
    {'I have': 2, 'have 2': 2, '2 laptops,': 1, '2 chargers': 1}
    
    >>> import json
    >>> print (json.dumps(d, indent=4))
    {
        "I have": 2,
        "have 2": 2,
        "2 I": 1,
        "2 chargers": 1
    }