python nlp huggingface-transformers nlp-question-answering

What is better custom training the bert model or use the model with pretrained data?

I am coding my own models for a time but I saw huggingface and started using it. I wanted to know whether I should use the pretrained model or train model (the same hugging face model) with my own dataset. I am trying to make a question answering model.

I have dataset of 10k-20k questions.

Solution

The state-of-the-art approach is to take a pre-trained model that was pre-trained on tasks that are relevant to your problem and fine-tune the model on your dataset.

So assuming you have your dataset in English, you should take a pre-trained model on natural language English. You can then fine-tune it.

This will most likely work better than training from scratch, but you can experiment on your own. You can also load a model without the pre-trained weights in Huggingface.