Search code examples
sequenceknnvalueerrordata-fitting

Fitting Multiple Input Columns in KNN Algorithm is Giving ValueError: setting an array element with a sequence


I have 2 input columns, the first column is binary (zero or one) and the second column is a feature vector of size 100. I want to fit these 2 columns in KNN model in order to predict the category column. I already did OneHotEncoding for the category column and I have outputted 15 extra columns (depending on the number of the categories).

When I fit the model it shows the following error:

ValueError: setting an array element with a sequence.

This is a part of my code:

X_level1 = np.asarray(dfCopy[['inputColumn1','inputColumn2']])
y_level1 = np.asarray(dfCopy[['OneHotEncodingColumn1','OneHotEncodingColumn2','OneHotEncodingColumn3',...,'OneHotEncodingColumn15']])

X_train1, X_val1, y_train1, y_val1 = train_test_split(X_level1, y_level1, test_size = 0.2, random_state=20)

This is a part of my input data:

array([[array([ 0.41164917,  0.33110523, -0.7823772 ,  0.12783737,  1.1618725 ,
       -0.7024268 ,  0.84284127,  1.5140213 ,  0.64215165, -1.6586455 ,
        0.46136633, -0.92533016,  0.50660706,  1.0788306 , -0.9702446 ,
        0.6586883 ,  1.7500123 , -0.15637057,  1.4345818 , -1.9476864 ,
        0.6294452 ,  0.12649943, -2.3380706 ,  0.61786395, -0.45559853,
       -0.5325301 ,  1.2698289 , -1.649353  , -0.18185338,  1.4399352 ,
        1.9842219 , -0.11131181,  0.42542225, -1.3662227 ,  0.57311517,
        3.4422836 , -0.9965432 , -0.58612174, -0.5525687 , -2.5889783 ,
       -0.8159157 , -1.8203335 , -0.58147144,  2.3315256 ,  0.42271224,
       -1.3675721 , -0.87182087,  0.6811211 , -1.5281016 ,  1.0560112 ,
        1.7546124 ,  1.3516003 ,  0.05760164,  0.4792729 ,  0.20388177,
        2.0917022 ,  0.26405442, -1.012274  , -0.7311924 , -0.4222189 ,
       -0.15046267,  1.838553  , -0.9228903 , -0.25226635, -2.7405736 ,
        1.0562496 ,  0.08701825,  0.42543337,  0.2115567 ,  1.3348918 ,
       -0.54058945,  1.2874343 ,  0.72596663, -2.399423  ,  1.7278377 ,
        1.3298786 , -0.6601989 ,  0.55112255, -0.60255444,  2.2411568 ,
        0.31967035,  1.7551464 , -0.70625794, -1.2612839 , -0.82214457,
        1.3652881 , -1.1309841 ,  0.3563959 ,  1.92157   ,  0.9091741 ,
       -0.09321591,  0.09579365,  0.87175727,  0.2785632 ,  1.8571266 ,
       -0.93616605, -0.09428027,  0.5034914 ,  0.55093   ,  1.0682331 ],
      dtype=float32),
        1],.,.,.,.,.,.,.,.,.,], dtype=object)

and this is part of the output data

array([[0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 1, 0]], dtype=uint8)

Solution

  • Try to convert your input from 2 columns into 101 columns (One column for each feature). And make sure the input raws are equal to the output raws. and make sure all raws have the same number of features.

    I think the model is trying (during training) to multiply the array with the weight.