I tried running this in 2 different computer, but whenever I tried to load the pickle file, I get:
AttributeError: Can't get attribute 'MyClass'
I'm trying to understand what am I doing wrong or if I have a wrong understanding of what pickle do? How can I fix it?
I wonder why SKLearn works when I just call xyz.fit(data) and it works after loading the pickle meanwhile my code calling xyz.add_one(5) gave me an error?
MY OWN Class: Creating Pickle:
import base64
import pickle
class MyClass:
def add_one(self, x):
return x + 1
obj = MyClass()
add_one = obj
pickled_data = pickle.dumps(add_one)
encoded_data = base64.b64encode(pickled_data)
with open("some_location.txt", "wb") as f:
f.write(encoded_data)
Loading Pickle in another computer:
import base64
import pickle
# Read the encoded data from the file
with open("some_location.txt", "rb") as f:
encoded_data = f.read()
pickled_data = base64.b64decode(encoded_data)
loaded = pickle.loads(pickled_data)
result = loaded.add_one(5)
print(result) # Output: 6
# Errored out: AttributeError: Can't get attribute 'MyClass'
Meanwhile, when I try to do this using SKlearn, I was able to call methods saved in class without issue. Example:
from sklearn import svm
from sklearn import datasets
import pickle
iris = datasets.load_iris()
X, y = iris.data, iris.target
clf = svm.SVC()
clf.fit(X, y)
with open("/dbfs/FileStore/tables/tmp/test.txt",'wb') as f:
pickle.dump(clf,f)
In a separate computer:
import pickle
X = [[5.1, 3.5, 1.4, 0.2],
[4.9, 3, 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
[5, 3.6, 1.4, 0.2]]
with open('/dbfs/FileStore/tables/tmp/test.txt', 'rb') as f:
clf2 = pickle.load(f)
print(clf2.predict(X[0:5]))
The full error output is helpful here. If we put the first script in t.py
and the second in y.py
then running the one after the other we get:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/private/tmp/tmpvenv-ec8f9/y.py", line 10, in <module>
loaded = pickle.loads(pickled_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't get attribute 'MyClass' on <module 'y' from '/private/tmp/tmpvenv-ec8f9/y.py'>
The last line shows us that Python is trying to rehydrate the objects and is looking for a MyClass
type to use to recreate the obj
. So it's looking for that class in the file we're running (on <module 'y' from '/private/tmp/tmpvenv-ec8f9/y.py'>
). But, of course, MyClass
doesn't exist.
The problem here is that the object that is being pickled has a type that cannot be found when we try to load it again. The code is likely working for you when you pickle a sklearn.SVC
instance because you have that library installed in both environments, so it is always available when you load the pickled object. (Can you confirm that?)
We can make this work (although possibly with surprising results) by adding a MyClass
to y.py
. If we add in:
import base64
import pickle
+ class MyClass:
+ def add_one(self, x):
+ return x + 10
# Read the encoded data from the file
with open("some_location.txt", "rb") as f:
encoded_data = f.read()
pickled_data = base64.b64decode(encoded_data)
loaded = pickle.loads(pickled_data)
result = loaded.add_one(5)
print(result)
Then when we run it we get:
$ python -m y
15
Loading the pickled object has been brought back but with the MyClass
type defined in y.py
, which does something different to the original!