Search code examples
pythonmachine-learningscikit-learndigits

Predicting numbers using sklearn digits dataset - error


I want to build a simple digit prediction model.

Therefore I:

  1. load in the sklearn dataset
  2. Use the DecisionTreeClassifier()
  3. Fit to the data
  4. Predict the new image
import matplotlib.pyplot as plt 
from sklearn import datasets 
from sklearn import tree
digits = datasets.load_digits() 
clf = tree.DecisionTreeClassifier()
clf = clf.fit(digits.data, digits.target) 
clf.predict(digits.data[-1])

What did I do wrong?

ValueError                                Traceback (most recent call last)
<ipython-input-9-b58a2a08d39b> in <module>()
----> 1 clf.predict(digits.data[-1])

Solution

  • Your problem was that you were passing 1D array when the model requested a 2D array.

    This should do the trick.

    from sklearn import datasets
    from sklearn import tree
    from sklearn.model_selection import StratifiedKFold
    
    # load the digits dataset
    digits = datasets.load_digits()
    
    # separate features and labels
    X_digits = digits.data
    y_digits = digits.target
    
    # split data into training and testing sets
    k_fold = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)
    for train_index, test_index in k_fold.split(X_digits, y_digits):
            train_features, test_features = X_digits[train_index], X_digits[test_index]
            train_labels, test_labels = y_digits[train_index], y_digits[test_index]
    
    # fit to model
    clf = tree.DecisionTreeClassifier()
    clf = clf.fit(train_features, train_labels)
    
    # predict on the testing features
    print(clf.predict(test_features))
    

    Also, have a look at this. It might provide you with further information.