I Was training my NER model with transformers, and am not really sure why the training stopped at some point, or why did it even go with so many batches. This is how my configuration file looks like (relevant part):
[training]
train_corpus = "corpora.train"
dev_corpus = "corpora.dev"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
accumulate_gradient = 1
patience = 1600
max_epochs = 2
max_steps = 0
eval_frequency = 200
frozen_components = []
before_to_disk = null
[training.batcher]
@batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2
get_length = null
[training.batcher.size]
@schedules = "compounding.v1"
start = 100
stop = 1000
compound = 1.001
t = 0.0
[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001
learn_rate = 0.00005
And this is the training log:
============================= Training pipeline =============================
ℹ Pipeline: ['transformer', 'ner']
ℹ Initial learn rate: 5e-05
E # LOSS TRANS... LOSS NER ENTS_F ENTS_P ENTS_R SCORE
--- ------ ------------- -------- ------ ------ ------ ------
0 0 398.75 40.97 2.84 3.36 2.46 0.03
0 200 906.30 1861.38 94.51 94.00 95.03 0.95
0 400 230.06 1028.51 98.10 97.32 98.89 0.98
0 600 90.22 1013.38 98.99 98.40 99.58 0.99
0 800 80.64 1131.73 99.02 98.25 99.81 0.99
0 1000 98.50 1260.47 99.50 99.16 99.85 1.00
0 1200 73.32 1414.91 99.49 99.25 99.73 0.99
0 1400 84.94 1529.75 99.70 99.56 99.85 1.00
0 1600 55.61 1697.55 99.75 99.63 99.87 1.00
0 1800 80.41 1936.64 99.75 99.63 99.87 1.00
0 2000 115.39 2125.54 99.78 99.69 99.87 1.00
0 2200 63.06 2395.48 99.80 99.75 99.85 1.00
0 2400 104.14 2574.36 99.87 99.79 99.96 1.00
0 2600 86.07 2308.35 99.88 99.79 99.97 1.00
0 2800 81.05 1853.15 99.90 99.87 99.93 1.00
0 3000 52.67 1462.61 99.96 99.93 99.99 1.00
0 3200 57.99 1154.62 99.94 99.91 99.97 1.00
0 3400 110.74 847.50 99.90 99.85 99.96 1.00
0 3600 90.49 621.99 99.90 99.91 99.90 1.00
0 3800 51.03 378.93 99.87 99.78 99.97 1.00
0 4000 93.40 274.80 99.95 99.93 99.97 1.00
0 4200 138.98 203.28 99.91 99.87 99.96 1.00
0 4400 106.16 127.60 99.70 99.75 99.64 1.00
0 4600 70.28 87.25 99.95 99.94 99.96 1.00
✔ Saved pipeline to output directory
training/model-last
I was trying to train my model for 2 epochs (max_epochs=2
), and my train file has around 123591 Examples, and dev file has 2522 Examples.
My question is:
Since my minimum batch size is 100, I expect my training to end before the 2400th eval batch, right? Because 2400th batch evaluated implies I have a minimum of 2400*100 = 240000, and it would actually be even more than that, since my batch size is increasing. So why did it go all the way to # 4600?
The training ended automatically, but the E still reads the 0th epoch. Why is that?
Edit: In continuation to my 2nd bullet point, I'm curious to know why did the training went all the way upto 4600 batches, because 4600 batches at minimum means 4600*100 = 460000 examples, and I gave 123591 examples for train, so I'm clearly well above and over the 1st epoch, but E still reads as 0.
There's an entry for this in the FAQ, but to summarize:
max_steps
is the maximum iterations. (Not "evaluation iterations", but batches.)max_epochs
is the maximum number of epochs.patience
batches without improvement it will stop. That is what stopped your training.It seems like your model has already gotten a perfect score so I'm not sure why early stopping is a problem in this case, but that's what's happening.