I am using the code below to create a list of sentences from a file document. The function will return a list of sentences.
def extract_sentences(file):
content = nlp(file)
sentences = list(content.sents)
return sentences
After that, I want to add each sentence in a dataframe, under the column "sentence". The problem is that in the dataframe, the sentences appear like a list of words, divided by comma, eg: (this, process, includes, different, stages... ). But I want it to appear like: this process includes different stages
content.sents
is a generator object that holds spacy.tokens.span.Span
objects.
If you want to have a list of strings as output, you can use
def extract_sentences(file):
content = nlp(file)
return [x.text for x in content.sents]
Note the .text
property returns the textual representation of the span object.