Search code examples
pythonscikit-learnclassificationtraining-data

Fitting Training Labels on a 2D List in Scikit-learn


I am trying to map rows in a 2d to list to elements in a list of labels with Scikit-learn.

For example:

from sklearn import tree
clf = DecisionTreeClassifier()

#2D list of training data:
training_data = [[1, 2, 3], [1, 2, 4, 5, 6], [5, 7], [1, 2, 3]]

#1D list of training labels:
training_labels = ['a', 'b', 'c', 'a']

clf = clf.fit(training_data, training_labels)

When I run the code, I get "ValueError: setting an array element with a sequence."

I am wondering how to properly transform the data so that I can fit the test data with training labels.


Solution

  • testing_data = [[1, 2, 3], [1, 2, 4, 5, 6], [5, 7], [1, 2, 3]]
    

    Here if each sublist is considered a sample, then you do not have the same dimensions per sample. In that case, it is impossible to fit the model.

    Also probably you mean:

     training_labels = ["a", "b", "c", "a"]
    

    Otherwise, a,b,c should be defined variables