Search code examples
pythonjupyter-notebookgoogle-colaboratorygpt-2

Train GPT-2 on local machine, load dataset


I am trying to run gpt-2 on my local machine, since google restricted my resources, because I was training too long in colab.

However, I cannot see how I can load the dataset. In the original colab notebook https://colab.research.google.com/drive/1VLG8e7YSEwypxU-noRNhsv5dW4NfTGce there is the command gpt2.copy_file_from_gdrive() which I cannot use on my local machine.

On the github repo https://github.com/minimaxir/gpt-2-simple they simply give the name of the file shakespeare.txt to the function gpt2.finetune and it works somehow, but this doesn't work for me.

Help would be much appreciated


Solution

  • If I read the example correctly on GitHub, it loads shakespeare.txt if it is present on the machine and downloads it if it isn't. For a local dataset, I simply drop a txt file in the same folder and call it in file_name =.

    You should be able to remove the logic around if not os.path.isfile(file_name):—it shouldn't be needed if you use a local file.