Search code examples
nlpspacynamed-entity-recognitionspacy-3

Training epochs interpretation during spaCy NER training


I Was training my NER model with transformers, and am not really sure why the training stopped at some point, or why did it even go with so many batches. This is how my configuration file looks like (relevant part):

[training]
train_corpus = "corpora.train"
dev_corpus = "corpora.dev"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
accumulate_gradient = 1
patience = 1600
max_epochs = 2
max_steps = 0
eval_frequency = 200
frozen_components = []
before_to_disk = null

[training.batcher]
@batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2
get_length = null

[training.batcher.size]
@schedules = "compounding.v1"
start = 100
stop = 1000
compound = 1.001
t = 0.0

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001
learn_rate = 0.00005

And this is the training log:

============================= Training pipeline =============================
ℹ Pipeline: ['transformer', 'ner']
ℹ Initial learn rate: 5e-05
E    #       LOSS TRANS...  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE 
---  ------  -------------  --------  ------  ------  ------  ------
  0       0         398.75     40.97    2.84    3.36    2.46    0.03
  0     200         906.30   1861.38   94.51   94.00   95.03    0.95
  0     400         230.06   1028.51   98.10   97.32   98.89    0.98
  0     600          90.22   1013.38   98.99   98.40   99.58    0.99
  0     800          80.64   1131.73   99.02   98.25   99.81    0.99
  0    1000          98.50   1260.47   99.50   99.16   99.85    1.00
  0    1200          73.32   1414.91   99.49   99.25   99.73    0.99
  0    1400          84.94   1529.75   99.70   99.56   99.85    1.00
  0    1600          55.61   1697.55   99.75   99.63   99.87    1.00
  0    1800          80.41   1936.64   99.75   99.63   99.87    1.00
  0    2000         115.39   2125.54   99.78   99.69   99.87    1.00
  0    2200          63.06   2395.48   99.80   99.75   99.85    1.00
  0    2400         104.14   2574.36   99.87   99.79   99.96    1.00
  0    2600          86.07   2308.35   99.88   99.79   99.97    1.00
  0    2800          81.05   1853.15   99.90   99.87   99.93    1.00
  0    3000          52.67   1462.61   99.96   99.93   99.99    1.00
  0    3200          57.99   1154.62   99.94   99.91   99.97    1.00
  0    3400         110.74    847.50   99.90   99.85   99.96    1.00
  0    3600          90.49    621.99   99.90   99.91   99.90    1.00
  0    3800          51.03    378.93   99.87   99.78   99.97    1.00
  0    4000          93.40    274.80   99.95   99.93   99.97    1.00
  0    4200         138.98    203.28   99.91   99.87   99.96    1.00
  0    4400         106.16    127.60   99.70   99.75   99.64    1.00
  0    4600          70.28     87.25   99.95   99.94   99.96    1.00
✔ Saved pipeline to output directory
training/model-last

I was trying to train my model for 2 epochs (max_epochs=2), and my train file has around 123591 Examples, and dev file has 2522 Examples.

My question is:

  • Since my minimum batch size is 100, I expect my training to end before the 2400th eval batch, right? Because 2400th batch evaluated implies I have a minimum of 2400*100 = 240000, and it would actually be even more than that, since my batch size is increasing. So why did it go all the way to # 4600?

  • The training ended automatically, but the E still reads the 0th epoch. Why is that?

Edit: In continuation to my 2nd bullet point, I'm curious to know why did the training went all the way upto 4600 batches, because 4600 batches at minimum means 4600*100 = 460000 examples, and I gave 123591 examples for train, so I'm clearly well above and over the 1st epoch, but E still reads as 0.


Solution

  • There's an entry for this in the FAQ, but to summarize:

    • max_steps is the maximum iterations. (Not "evaluation iterations", but batches.)
    • max_epochs is the maximum number of epochs.
    • If training goes for patience batches without improvement it will stop. That is what stopped your training.

    It seems like your model has already gotten a perfect score so I'm not sure why early stopping is a problem in this case, but that's what's happening.