I have a list of words like this:
['Urgente', 'Recibimos', 'Info']
I used the parsetree (parsetree(x, lemmata = True)
function to convert the words and the output for each Word is this:
[[Sentence('urgente/JJ/B-ADJP/O/urgente')],
[Sentence('recibimos/NN/B-NP/O/recibimos')],
[Sentence('info/NN/B-NP/O/info')]]
Each component of the list has the type pattern.text.tree.Text
.
I need to obtain only the group of words into the parenthesis but I don´t know how to do this, I need this output:
[urgente/JJ/B-ADJP/O/urgente,
recibimos/NN/B-NP/O/recibimos,
info/NN/B-NP/O/info]
I use str
to convert to string each component to the list but this changes all output.
From their documentation, there doesn't seem to be a direct method or property to get what you want.
But I found that a Sentence
object can be printed as Sentence('urgente/JJ/B-ADJP/O/urgente')
using repr
. So I looked at the source code for the __repr__
implementation to see how it is formed:
def __repr__(self):
return "Sentence(%s)" % repr(" ".join(["/".join(word.tags) for word in self.words]))
It seems that the string "in parenthesis" is a combination of words and tags. You can then reuse that code, knowing that if you already have pattern.text.tree.Text
objects, "a Text is a list of Sentence objects. Each Sentence is a list of Word objects." (from the Parse trees documentation).
So here's my hacky solution:
parsed = list()
for data in ['Urgente', 'Recibimos', 'Info']:
parsed.append(parsetree(data, lemmata=True))
output = list()
for text in parsed:
for sentence in text:
formatted = " ".join(["/".join(word.tags) for word in sentence.words])
output.append(str(formatted))
print(output)
Printing output
gives:
['Urgente/NNP/B-NP/O/urgente', 'Recibimos/NNP/B-NP/O/recibimos', 'Info/NNP/B-NP/O/info']
Note that this solution results in a list of str
s (losing all the properties/methods from the original parsetree
output).