I have pickled a SMS spam prediction model using pickle. Now, I want to use Pyodide to load the model in the browser.
I have loaded the pickled file using pickle.loads in the browser:
console.log("Pyodide loaded, downloading pretrained ML model...")
const model = (await blobToBase64(await (await fetch("/model.pkl")).blob())).replace("data:application/octet-stream;base64,", "")
console.log("Loading model into Pyodide...")
await pyodide.loadPackage("scikit-learn")
await pyodide.loadPackage("joblib")
pyodide.runPython(`
import base64
import pickle
from io import BytesIO
classifier, vectorizer = pickle.loads(base64.b64decode('${model}'))
`)
This works.
But, when I try to call:
const prediction = pyodide.runPython(`
vectorized_message = vectorizer.transform(["Call +172949 if you want to get $1000 immediately!!!!"])
classifier.predict(vectorized_message)[0]
`)
It gives an error(in vectorizer.transform): AttributeError: format not found
Full error dump is below.
Uncaught (in promise) Error: Traceback (most recent call last):
File "/lib/python3.8/site-packages/pyodide/_base.py", line 70, in eval_code
eval(compile(mod, "<exec>", mode="exec"), ns, ns)
File "<exec>", line 2, in <module>
File "/lib/python3.8/site-packages/sklearn/feature_extraction/text.py", line 1899, in transform
return self._tfidf.transform(X, copy=False)
File "/lib/python3.8/site-packages/sklearn/feature_extraction/text.py", line 1513, in transform
X = X * self._idf_diag
File "/lib/python3.8/site-packages/scipy/sparse/base.py", line 319, in __mul__
return self._mul_sparse_matrix(other)
File "/lib/python3.8/site-packages/scipy/sparse/compressed.py", line 478, in _mul_sparse_matrix
other = self.__class__(other) # convert to this format
File "/lib/python3.8/site-packages/scipy/sparse/compressed.py", line 28, in __init__
if arg1.format == self.format and copy:
File "/lib/python3.8/site-packages/scipy/sparse/base.py", line 525, in __getattr__
raise AttributeError(attr + " not found")
AttributeError: format not found
_hiwire_throw_error https://cdn.jsdelivr.net/pyodide/v0.16.1/full/pyodide.asm.js:8
__runPython https://cdn.jsdelivr.net/pyodide/v0.16.1/full/pyodide.asm.js:8
_runPythonInternal https://cdn.jsdelivr.net/pyodide/v0.16.1/full/pyodide.asm.js:8
runPython https://cdn.jsdelivr.net/pyodide/v0.16.1/full/pyodide.asm.js:8
<anonymous> http://localhost/:41
async* http://localhost/:46
pyodide.asm.js:8:39788
In Python it works fine though.
What I might be doing wrong?
It's likely a pickle portability issue. Pickles should be portable between architectures¹, here amd64
and wasm32
however they are not portable across package versions. This means that package versions should be identical between the environement where you train your model and where you do the inference (pyodide).
pyodide 0.16.1 includes Python 3.8.2, scipy 0.17.1 and scikit-learn 0.22.2. Which unfortunately means that you will have to build that version of scipy (and possibly numpy) from sources to train the model, since a Python 3.8 binary wheel doesn't exist for such an outdated version of scipy. In the future this should be resolved with pyodide#1293.
The particular error you are getting is likely due to scipy.sparse
version mimatch see scipy#6533
¹Though, tree based models in scikit-learn at present are not portable across architectures, and so won't unpickle in pyodide. This is known bug that should be fixed (scikit-learn#19602)