I have a paragraph of text. I want to calculate all the possible combination of 2 words (2 words have to be next to each other) For example:
"I have 2 laptops, I have 2 chargers"
The result should be:
"I have": 2
"have 2": 2
"2 laptops": 1
"Laptops, I": (Dont count)
"2 chargers": 1
I tried Regex but the thing is that it doesnt count a string twice
I used: \b[a-z]{1,20}\b \b[a-z]{1,20}\b
Text: cold chain, energy storage device, industrial cooling system
It works almost but it doesn't include words such as "storage device", cooling system
because it already takes energy storage
and industrial cooling
Appreciate your advice
You can use zip
to get groups of every two words and then use Counter
to get the frequency
>>> from collections import Counter
>>> text = "I have 2 laptops, I have 2 chargers"
>>> words = text.split()
>>> d = {' '.join(words):n for words,n in Counter(zip(words, words[1:])).items() if not words[0][-1]==(',')}
>>> print (d)
{'I have': 2, 'have 2': 2, '2 laptops,': 1, '2 chargers': 1}
>>> import json
>>> print (json.dumps(d, indent=4))
{
"I have": 2,
"have 2": 2,
"2 I": 1,
"2 chargers": 1
}