I think the documentation only explains how to use the model through an API but that does not allow much flexibility nor automation. For example, I do not know how to test my model against some popular benchmarks from HuggingFace.
The general flow of fine tuning Open AI models consists of creating an account, having a valid API key and then uploading the data for fine tuning using the CLI tool, as described here: https://beta.openai.com/docs/guides/fine-tuning
Then to test against question answering benchmarks, like SQuAD you simply dowload the dataset, create a script that takes the questions (see below json snippet) and feeds to your model by calling the API as described here (using curl): https://beta.openai.com/docs/api-reference/making-requests
"question": "What century did the Normans first gain their separate identity?",
"id": "56ddde6b9a695914005b962c",
"answers": [
{
"text": "10th century",
"answer_start": 671
},
{
"text": "the first half of the 10th century",
"answer_start": 649
},
{
"text": "10th",
"answer_start": 671
},
{
"text": "10th",
"answer_start": 671
}
],
"is_impossible": false