Search code examples
pythonhuggingface-transformershuggingface-tokenizers

How to cache HuggingFace model and tokenizer


I'm using hugginface model distilbert-base-uncased and tokenizer DistilBertTokenizerFast and I'm loading them currently using .from_pretrained()

I want cache them so that they work without internet was well.

I tried cache_dir parameter in the from_pretrained() but it didn't work.

Any suggestions?


Solution

  • I solved the problem by these steps:

    1. Use .from_pretrained() with cache_dir = RELATIVE_PATH to download the files
    2. Inside RELATIVE_PATH folder, for example, you might have files like these:enter image description here open the json file and inside the url, in the end you will see the name of the file like config.json. Copy this name
    3. Rename the other file present in the image to the text which you copied (in our example config.json)
    4. Repeat these steps for other files
    5. Run .from_pretrained(RELATIVE_PATH, local_files_only = True) in your model/tokenizer.

    This solution should work