I am using Yake (Yet Another Keyword Extractor) to extract keywords from a dataframe. I want to extract only bigrams and trigrams, but Yake allows only to set a max ngram size and not a min size. How do you would remove them?
Example df.head(0):
Text: 'oui , yes , i mumbled , the linguistic transition now in limbo .'
Keywords: '[('oui', 0.04491197687864554), ('linguistic transition', 0.09700399286574239), ('mumbled', 0.15831692877998726)]'
I want to remove oui, mumbled and their scores from keywords column.
Thank you for your time!
If your problem is that the keywords list contains some monograms, you can simply do a filter that ignores words without spaces and create a new list. I'll give you an example:
keywords_without_unigrams = []
for kw in keywords:
if(' ' in kw[0]):
keywords_without_unigrams.append(kw)
for kw in keywords_without_unigrams:
print(kw)