I am new to machine learning, I am trying to apply logistic regression on my sample data set I have a single feature that contains a list of numbers and want to predict class.
the following is my code
from sklearn.linear_model import LogisticRegression
a = [[1,2,3], [1,2,3,4,5,6], [4,5,6,7], [0,0,0,7,1,2,3]]
b = [0,1,0, 0]
p = [[9,0,2,4]]
clfModel1 = LogisticRegression(class_weight='balanced')
clfModel1.fit(a,b)
clfModel1.predict(p)
I am getting the following error
Traceback (most recent call last):
File "F:\python_3.4\NLP\t.py", line 7, in <module>
clfModel1.fit(a,b)
File "C:\Python34\lib\site-packages\sklearn\linear_model\logistic.py", line 1173, in fit
order="C")
File "C:\Python34\lib\site-packages\sklearn\utils\validation.py", line 521, in check_X_y
ensure_min_features, warn_on_dtype, estimator)
File "C:\Python34\lib\site-packages\sklearn\utils\validation.py", line 382, in check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: setting an array element with a sequence.
>>>
Is there some way to change the data such that I can the apply the classifier and predict the results
Logistic regression is an estimator for functions of form:
R^d -> [0,1]
But your data clearly is not a subset of R^d, as each sample in a has different length (number of dimensions), thus it cannot be applied.
Another problem is that p should be a list of samples too, not a single sample (and it has to have d dimensions too, of course).
There is no "way around this" it is simply a wrong idea. What is a typical solution to working with "odd" data:
There is no other way - either rethink representation of your data, or change approach.