Search code examples
openai-api

How can I test my openai fine-tuned model against question answering benchmarks?


I think the documentation only explains how to use the model through an API but that does not allow much flexibility nor automation. For example, I do not know how to test my model against some popular benchmarks from HuggingFace.


Solution

  • The general flow of fine tuning Open AI models consists of creating an account, having a valid API key and then uploading the data for fine tuning using the CLI tool, as described here: https://beta.openai.com/docs/guides/fine-tuning

    Then to test against question answering benchmarks, like SQuAD you simply dowload the dataset, create a script that takes the questions (see below json snippet) and feeds to your model by calling the API as described here (using curl): https://beta.openai.com/docs/api-reference/making-requests

    "question": "What century did the Normans first gain their separate identity?",
    "id": "56ddde6b9a695914005b962c",
    "answers": [
        {
            "text": "10th century",
            "answer_start": 671
        },
        {
            "text": "the first half of the 10th century",
            "answer_start": 649
        },
        {
            "text": "10th",
            "answer_start": 671
        },
        {
            "text": "10th",
            "answer_start": 671
        }
    ],
    "is_impossible": false