Search code examples
pythonserializationmergespacydoc

DocBin.merge method in spaCy


The following codes do not work, it is simply suggested in documentation here!

import spacy # V2.2
from spacy.tokens import DocBin # V2.2
nlp = spacy.load('en_core_web_sm') # V2.2

doc_bin1 = DocBin(attrs=["LEMMA", "POS"])
doc_bin1.add(nlp("Hello world"))
doc_bin2 = DocBin(attrs=["LEMMA", "POS"])
doc_bin2.add(nlp("This is a sentence"))
merged_bins = doc_bin1.merge(doc_bin2)
assert len(merged_bins) == 2

returns the following error:

---> assert len(merged_bins) == 2
TypeError: object of type 'NoneType' has no len()

What's the solution?


Solution

  • This looks like a mistake in the example. doc_bin1.merge(doc_bin2) merges doc_bin2 into doc_bin1 and doesn't return a value. The final lines should be:

    doc_bin1.merge(doc_bin2)
    assert len(doc_bin1) == 2