I have a custom tokenizer and want to use it for prediction in Production API. How do I save/download the tokenizer?
This is my code trying to save it:
import pickle
from tensorflow.python.lib.io import file_io
with file_io.FileIO('tokenizer.pickle', 'wb') as handle:
pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)
No error, but I can't find the tokenizer after saving it. So I assume the code didn't work?
Here is the situation, using a simple file to disentangle the issue from irrelevant specificities like pickle, Tensorflow, and tokenizers:
# Run in a new Colab notebook:
%pwd
/content
%ls
sample_data/
Let's save a simple file foo.npy
:
import numpy as np
np.save('foo', np.array([1,2,3]))
%ls
foo.npy sample_data/
In this stage, %ls
should show tokenizer.pickle
in your case instead of foo.npy
.
Now, Google Drive & Colab do not communicate by default; you have to mount the drive first (it will ask for identification):
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
After which, an %ls
command will give:
%ls
drive/ foo.npy sample_data/
and you can now navigate (and save) inside drive/
(i.e. actually in your Google Drive), changing the path accordingly. Anything saved there can be retrieved later.