Search code examples
pythonlogistic-regression

ValueError: data type must provide an itemsize?


My code as follows, every time when I run it , it has an error;

"ValueError: data type must provide an itemsize"

I can't find the reason why it doesn;t work.

I don't know why?

from sklearn.linear_model import LogisticRegression
trainX = [('2', '0.455', '0.365', '0.095', '0.514', '0.2245', '0.101', '0.15'), ('2', '0.35', '0.265', '0.09', '0.2255', '0.0995', '0.0485', '0.07'), ('1', '0.53', '0.42', '0.135', '0.677', '0.2565', '0.1415', '0.21'), ('2', '0.44', '0.365', '0.125', '0.516', '0.2155', '0.114', '0.155'), ('3', '0.33', '0.255', '0.08', '0.205', '0.0895', '0.0395', '0.055')]
trainY = ['15', '7', '9', '10', '7']
testX = [('3', '0.475', '0.36', '0.11', '0.452', '0.191', '0.099', '0.13'), ('3', '0.485', '0.37', '0.14', '0.5065', '0.2425', '0.088', '0.1465')]
model = LogisticRegression()
model.fit(trainX,trainY)
predict = model.predict(testX[0:2])#error
print predict

Solution

  • Since LogisticRegression requires numeric data, first convert your data to float using numpy and then use LogisticRegression as shown below:

    >>> from sklearn.linear_model import LogisticRegression
    >>> import numpy as np
    >>> trainX = [('2', '0.455', '0.365', '0.095', '0.514', '0.2245', '0.101', '0.15'), ('2', '0.35', '0.265', '0.09', '0.2255', '0.0995', '0.0485', '0.07'), ('1', '0.53', '0.42', '0.135', '0.677', '0.2565', '0.1415', '0.21'), ('2', '0.44', '0.365', '0.125', '0.516', '0.2155', '0.114', '0.155'), ('3', '0.33', '0.255', '0.08', '0.205', '0.0895', '0.0395', '0.055')]
    >>> trainY = ['15', '7', '9', '10', '7']
    >>> testX = [('3', '0.475', '0.36', '0.11', '0.452', '0.191', '0.099', '0.13'), ('3', '0.485', '0.37', '0.14', '0.5065', '0.2425', '0.088', '0.1465')]
    model = LogisticRegression()
    >>> trainX=np.array(trainX,dtype=float)
    >>> trainY=np.array(trainY,dtype=float)
    >>> testX=np.array(testX,dtype=float)
    >>> model.fit(trainX,trainY)
    LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
              intercept_scaling=1, penalty='l2', random_state=None, tol=0.0001)
    >>> predict = model.predict(testX[0:2])
    >>> predict
    array([ 7.,  7.])