python scikit-learn pymongo gridfs joblib

Cannot load joblib serialized model from GridFS

I can dump sklearn models to gridFS :

import gridfs
fs = gridfs.GridFS(db)
gridFS_file = fs.new_file()
joblib.dump(vectorizer, gridFS_file)

This works and I can see the model stored in my Mongo.

But I can't read directly from GridFS :

from bson.objectid import ObjectId
new_file = fs.get(ObjectId("59df36ebe46a520014e0771d"))
vectorizer2 = joblib.load(new_file)

This takes forever and never finishes. However, this works (and finishes quickly) :

with open('vec.pkl', 'wb') as f:
    f.write(new_file.read())
    vectorizer3 = joblib.load("vec.pkl")

What am I missing ?

Solution

A better workaround consists of first reading the file to a variable and then convert it to a stream, as following:

joblib.load(io.BytesIO(new_file.read()))

matplotlib 3D scatter plot alpha varies when viewing different angles
How to write very long string that conforms with PEP8 and prevent E501
Getting Home Directory with pathlib
how to avoid bot detection on websites using selenium python
Python mock to create a fake object return a dictionary when any of its attributes are used
Polars vs. Pandas: size and speed difference
How to mock.patch a class imported in another module
Python - error cannot determine truth value of Relational (Newton-Raphson)
ProcessPoolExecutor logging fails to log inside function on Windows but not on Unix / Mac
SQLAlchemy ORM Insert or Update when importing from JSON
django managers vs proxy models
Pytroch clamp for complex values
For every identifier select only rows with largest order column
truth value for Expr is ambiguous in with_columns ternary expansion on dates
Remove equal characters from two python strings
Python pyad module can't set UPN
Macro VS Micro VS Weighted VS Samples F1 Score
Printing a Tree data structure in Python
How to fix/reset decreasing timestamps while preserving gaps in time-series data for CNN training?
Test that module is NOT imported
Pyserial module isn't installed on PATH
Print a multiplication table in Python
Python: ModuleNotFoundError: No module named 'xyz'
Receiving Import Error: No Module named ***, but has __init__.py
PyQt5 QProgressBar border radius issue
URL-encoding and -decoding a string in Python
Fastest way to find the smallest possible sum of the absolute differences of pairs within a single array?
Flask: Update Code Reference for: current_app._get_current_object()
Export Charts from Excel as images using Python
Align yaxis label spanning two axes with yaxis labels of one axes in subplots