Search code examples
pythontensorflowlstmword2vec

Is there a difference between training small data multiple times and large data once when training a model?


I already have a model that has trained 130,000 sentences.

I want to categorize sentences with bidirectional lstm. We plan to use this service. However, the model must continue to be trained throughout the service.

so i Think Until the accuracy of the model increases I will look at the sentences that the model has categorized and I will answer them myself.

I will train sentence to answer.

Is there a difference between training the sentences one by one and training them by merge them into one file?

Every time I give a sentence One by one training Does it matter?


Solution

  • Yes, there is a difference. Suppose, you have a dataset of 10,000 sentences.

    • If you are training one sentence at each time, then optimization will take place at every sentence ( backpropagation ). This consumes more time and memory and is not a good choice. This is not possible if you have a large dataset. Computing gradient on each instance is noisy and the speed of convergence is less.
    • If you are training in batches, suppose the batch size is 1000, then you have 10 batches. These batches together go in the network and thus gradients are computed over these batches. Hence, the gradients receive enough noise to converge at the global minima rather than the local minima. Also, it is memory efficient and converges faster.

    You can check out answers from here, here and here.