Search code examples
pythonclassscikit-learnpickle

Why my Pickled class giving me AttributeError: Can't get attribute 'MyClass'?


I tried running this in 2 different computer, but whenever I tried to load the pickle file, I get: AttributeError: Can't get attribute 'MyClass'

I'm trying to understand what am I doing wrong or if I have a wrong understanding of what pickle do? How can I fix it?

I wonder why SKLearn works when I just call xyz.fit(data) and it works after loading the pickle meanwhile my code calling xyz.add_one(5) gave me an error?

MY OWN Class: Creating Pickle:

import base64
import pickle

class MyClass:
    def add_one(self, x):
        return x + 1

obj = MyClass()

add_one = obj

pickled_data = pickle.dumps(add_one)

encoded_data = base64.b64encode(pickled_data)

with open("some_location.txt", "wb") as f:
    f.write(encoded_data)

Loading Pickle in another computer:

import base64
import pickle

# Read the encoded data from the file
with open("some_location.txt", "rb") as f:
    encoded_data = f.read()

pickled_data = base64.b64decode(encoded_data)

loaded = pickle.loads(pickled_data)

result = loaded.add_one(5)  
print(result)  # Output: 6
# Errored out: AttributeError: Can't get attribute 'MyClass'

Meanwhile, when I try to do this using SKlearn, I was able to call methods saved in class without issue. Example:

from sklearn import svm
from sklearn import datasets
import pickle

iris = datasets.load_iris()
X, y = iris.data, iris.target

clf = svm.SVC()
clf.fit(X, y)  

with open("/dbfs/FileStore/tables/tmp/test.txt",'wb') as f:
    pickle.dump(clf,f)

In a separate computer:

import pickle

X = [[5.1, 3.5, 1.4, 0.2],
 [4.9, 3,  1.4, 0.2],
 [4.7, 3.2, 1.3, 0.2],
 [4.6, 3.1, 1.5, 0.2],
 [5,  3.6, 1.4, 0.2]]

with open('/dbfs/FileStore/tables/tmp/test.txt', 'rb') as f:
    clf2 = pickle.load(f)


print(clf2.predict(X[0:5]))

Solution

  • The full error output is helpful here. If we put the first script in t.py and the second in y.py then running the one after the other we get:

    Traceback (most recent call last):
      File "<frozen runpy>", line 198, in _run_module_as_main
      File "<frozen runpy>", line 88, in _run_code
      File "/private/tmp/tmpvenv-ec8f9/y.py", line 10, in <module>
        loaded = pickle.loads(pickled_data)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
    AttributeError: Can't get attribute 'MyClass' on <module 'y' from '/private/tmp/tmpvenv-ec8f9/y.py'>
    

    The last line shows us that Python is trying to rehydrate the objects and is looking for a MyClass type to use to recreate the obj. So it's looking for that class in the file we're running (on <module 'y' from '/private/tmp/tmpvenv-ec8f9/y.py'>). But, of course, MyClass doesn't exist.

    The problem here is that the object that is being pickled has a type that cannot be found when we try to load it again. The code is likely working for you when you pickle a sklearn.SVC instance because you have that library installed in both environments, so it is always available when you load the pickled object. (Can you confirm that?)


    We can make this work (although possibly with surprising results) by adding a MyClass to y.py. If we add in:

      import base64
      import pickle
    
    + class MyClass:
    +     def add_one(self, x):
    +         return x + 10
    
    
      # Read the encoded data from the file
      with open("some_location.txt", "rb") as f:
          encoded_data = f.read()
    
      pickled_data = base64.b64decode(encoded_data)
    
      loaded = pickle.loads(pickled_data)
    
      result = loaded.add_one(5)
      print(result)
    

    Then when we run it we get:

    $ python -m y
    15
    

    Loading the pickled object has been brought back but with the MyClass type defined in y.py, which does something different to the original!