Search code examples
pythonmnist

Having problems in loading MNIST database


So i am simply trying to load the MNIST database (which i downloaded) and train a classifier and then save the training session to a file for future use. I have tried downloading it directly (through fetch_mldata) but my internet seems to be too slow to go that way,so i am trying to read the database by downloading it exteranlly but am getting an error as given below the code. Any help would be much appreciated!

from sklearn.externals import joblib
from sklearn import datasets
from skimage.feature import hog
from sklearn.datasets import fetch_mldata
from sklearn.svm import LinearSVC
import numpy as np
import mlab
import scipy.io

print 'fetching'
dataset = scipy.io.loadmat('mnist-original.mat')

print 'fetched'
features = np.array(dataset.data, 'int16')
labels = np.array(dataset.target, 'int')
ist_hog_fd = []
for feature in features:
    fd = hog(feature.reshape((28, 28)), orientations=9, pixels_per_cell=(14, 14), cells_per_block=(1, 1), visualise=False)
    list_hog_fd.append(fd)
hog_features = np.array(list_hog_fd, 'float64')
clf=LinearSVC()
clf.fit(hog_features,labels)
joblib.dump(clf, "digits_cls.pkl", compress=3)

When i run it , i get an error as:

Traceback (most recent call last):
  File "/home/samad/Red_Queen/v2(ud&ay)/scratch2.py", line 14, in <module>
    features = np.array(dataset.data, 'int16')
AttributeError: 'dict' object has no attribute 'data'

To be honest i am not very known to numpy arrays as well as handling .mat files.


Solution

  • So , it took some time but i found a fix anyway. Instead of using scipy to load the '.mat' extension file il used the auto downloder or this code to directly download the database:

    dataset = datasets.fetch_mldata("MNIST Original")
    

    and the trick is I placed the externally downloaded file in the cache folder of scikit, so it won't have to download it. The path to the cache directory is:

    ~/scikit_learn_data/mldata/ 
    

    And you can download the original mat file from anywhere through a Google search.

    But I used the link : Link to Download which may or may not work atm.