Is it recommended to train my own models for things like sentiment analysis, despite only having a very small dataset (5000 reviews), or is it best to use pretrained models which were trained on way larger datasets, however aren't "specialized" on my data.
Also, how could I train my model on my data and then later use it on it too? I was thinking of an iterative approach where the training data would be randomly selected subset of my total data for each learning epoch.
I would go like this:
Also, how could I train my model on my data and then later use it on it too?
Be careful to distinguish between pre-train and fine-tuning.
For pre-training you need a huge amount of text (like billions of characters), it is very resource demanding, and tipically you don't want to do that, unless for a very good reason (for example, a model for your target language does not exist).
Fine-tuning requires much much less examples (some tents of thousands), it take tipycally less than a day on a single GPU and allow you to exploit pre-trained model created by someone else.
From what you write, I would go with fine-tune.
Of course you can save the model for later, as you can see in the tutorial I linked above:
model.save_pretrained("my_imdb_model")