I would like to create a dataset to use it for fine-tuning GPT3. As I read from the following site https://beta.openai.com/docs/guides/fine-tuning, the dataset should look like this
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
...
For this reason I am creating the dataset with the following way
import json
# Data to be written
dictionary = {
"prompt": "<text1>", "completion": "<text to be generated1>"}, {
"prompt": "<text2>", "completion": "<text to be generated2>"}
with open("sample2.json", "w") as outfile:
json.dump(dictionary, outfile)
However, when I am trying to load it, it looks like this which is not as we want
import json
# Opening JSON file
with open('sample2.json', 'r') as openfile:
# Reading from json file
json_object = json.load(openfile)
print(json_object)
print(type(json_object))
>> [{'prompt': '<text1>', 'completion': '<text to be generated1>'}, {'prompt': '<text2>', 'completion': '<text to be generated2>'}]
<class 'list'>
Could you please let me know how can I face this problem?
it's more like, writing \n
a new line character after each json. so each line is JSON. somehow the link jsonlines throw server not found error on me.
you can have these options:
\n
after each line:import json
with open("sample2_op1.json", "w") as outfile:
for e_json in dictionary:
json.dump(e_json, outfile)
outfile.write('\n')
#read file, as it has \n, read line by line and load as json
with open("sample2_op1.json","r") as file:
for line in file:
print(json.loads(line),type(json.loads(line)))
!pip install jsonlines
import jsonlines
#write to file
with jsonlines.open('sample2_op2.jsonl', 'w') as outfile:
outfile.write_all(dictionary)
#read the file
with jsonlines.open('sample2_op2.jsonl') as reader:
for obj in reader:
print(obj)