I have list of sentence and I want to create skipgram (window size = 3)
but I DONT want the counter to span across sentences since they are all unrelated.
So, if I have the sentences:
[["my name is John"] , ["This PC is black"]]
the triplets will be:
[my name is]
[name is john]
[this PC is]
[PC is black]
What is the best way to do it?
Try this!
from nltk import ngrams
def generate_ngrams(sentences,window_size =3):
for sentence in sentences:
yield from ngrams(sentence[0].split(), window_size)
sentences= [["my name is John"] , ["This PC is black"]]
for c in generate_ngrams(sentences,3):
print (c)
#output:
('my', 'name', 'is')
('name', 'is', 'John')
('This', 'PC', 'is')
('PC', 'is', 'black')