I am trying to run gpt-2 on my local machine, since google restricted my resources, because I was training too long in colab.
However, I cannot see how I can load the dataset. In the original colab notebook https://colab.research.google.com/drive/1VLG8e7YSEwypxU-noRNhsv5dW4NfTGce there is the command gpt2.copy_file_from_gdrive() which I cannot use on my local machine.
On the github repo https://github.com/minimaxir/gpt-2-simple they simply give the name of the file shakespeare.txt to the function gpt2.finetune and it works somehow, but this doesn't work for me.
Help would be much appreciated
If I read the example correctly on GitHub, it loads shakespeare.txt
if it is present on the machine and downloads it if it isn't. For a local dataset, I simply drop a txt file in the same folder and call it in file_name =
.
You should be able to remove the logic around if not os.path.isfile(file_name):
—it shouldn't be needed if you use a local file.