I'm beginner at nlp and I'm using gensim for the first time. I noticed that some text it returns a blank summary. For example:
from gensim.summarization import summarize
text ="The continued digitization of most every sector of society and industry means that an ever-growing volume of data will continue to be generated. The ability to gain insights from these vast datasets is one key to addressing an enormous array of issues — from identifying and treating diseases more effectively, to fighting cyber criminals, to helping organizations operate more effectively to boost the bottom line."
summarize(text, 0.6)
returns:
''
When I have equivalent sized paragraphs in other instances it returns a summary, so I know it's not that my ratio is too small. Any insights appreciated!
For the sake of the answer I'll assume Gensim version 3.8.3 - this is the latest version that (currently) supports summarization, since there are no API stubs in version 4 anymore.
Specifically, when looking at the reference for summarize()
, we can read the following:
Get a summarized version of the given text.
The output summary will consist of the most representative sentences and will be returned as a string, divided by newlines.
The highlighted part also explains why your output is empty: Gensim employs an extractive summarizer, which can only choose different sentences, not sentence parts. Therefore, either the entire sentence is selected (resulting in no "summarization"), or return the empty answer. Fixing this problem is also not trivial, and I think you have only one of two (sub-optimal) choices: