Can I fine-tune BERT using only masked language model and next sentence prediction?

So if I understand correctly there are mainly two ways to adapt BERT to a specific task: fine-tuning (all weights are changed, even pretrained ones) and feature-based (pretrained weights are frozen). However, I am confused.

When to use which one? If you have unlabeled data (unsupervised learning), should you then use fine-tuning?
If I want to fine-tuned BERT, isn't the only option to do that using masked language model and next sentence prediction? And also: is it necessary to put another layer of neural network on top?

Thank you.

Solution

Your first approach should be to try the pre-trained weights. Generally it works well. However if you are working on a different domain (e.g.: Medicine), then you'll need to fine-tune on data from new domain. Again you might be able to find pre-trained models on the domains (e.g.: BioBERT).

For adding layer, there are slightly different approaches depending on your task. E.g.: For question-answering, have a look at TANDA paper (Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection). It is a very nice easily readable paper which explains the transfer and adaptation strategy. Again, hugging-face has modified and pre-trained models for most of the standard tasks.