I have docs in spacy that use spans, such as:
sent = 'I eat 5 apples and 2 bananas.'
doc = nlp(sent)
doc.spans['sc'] = [
Span(doc, 2, 3, 'Ingredient'),
Span(doc, 5, 6, 'Ingredient'),
Span(doc, 2, 6, 'Meal')]
How can I iterate over all spans with the label 'Meal' and show the spans that lie completely within the boundries of those span(s)? I know there is something for ents within spans. But that is not what I'm looking for.
spaCy's SpanGroup
object has a useful has_overlap
property that can help you with an initial check. Then, you can use a simple straightforward approach by writing a couple of loops or list comprehensions to search within your defined spans using the .start
and .end
properties.
Here's how I would write a snippet to handle such a task:
import spacy
from spacy.tokens import Span
nlp = spacy.load('en_core_web_sm')
sent = 'I eat 5 apples and 2 bananas.'
doc = nlp(sent)
doc.spans['sc'] = [
Span(doc, 0, 1, 'Subject'),
Span(doc, 1, 2, 'Verb'),
Span(doc, 3, 4, 'Ingredient'),
Span(doc, 6, 7, 'Ingredient'),
Span(doc, 2, 7, 'Meal')]
if doc.spans['sc'].has_overlap:
meal_start_ends = [(span.start, span.end) for span in doc.spans['sc'] if span.label_ == 'Meal']
meal_ingredients = [[ig for ig in doc.spans['sc'] if ig.start >= meal[0] and ig.end <= meal[1] and ig.label_=='Ingredient'] for meal in meal_start_ends]
print(meal_ingredients)
This little snippet should print out [[apples, bananas]]
, which is hopefully what you wanted to achieve.