Search code examples
python-3.xspacynamed-entity-recognitionspacy-3

Train Spacy model with larger-than-RAM dataset


I asked this question to better understand some of the nuances between training Spacy models with DocBins serialized to disk, versus loading Example instances via custom data loading function. The goal was to train a Spacy NER model with more data that can fit into RAM (or at least some way to avoid loading the entire file into RAM). Though the custom data loader seemed like one specific way to accomplish this, I am writing this question to ask more generally:

How can one train a Spacy model without loading the entire training data set file during training?


Solution

  • Your only options are using a custom data loader or setting max_epochs = -1. See the docs.