Search code examples
pythonkerasscikit-learnmultilabel-classificationmulticlass-classification

could not broadcast input array from shape (27839,1) into shape (27839)


I'm building a chain classifier for a multiclass problem that uses Keras binary Classifier model in a chain. I have 17 labels as classification target and shape of X_train is (111300,107) and y_train is (111300,17). After training, I got following Error in predict method;

        *could not broadcast input array from shape (27839,1) into shape (27839)*

My code is here:

def create_model():
  input_size=length_long_sentence
  embedding_size=128
  lstm_size=64
  output_size=len(unique_tag_set)
    #----------------------------Model--------------------------------
  current_input=Input(shape=(input_size,)) 
  emb_current = Embedding(vocab_size, embedding_size, input_length=input_size)(current_input)
  out_current=Bidirectional(LSTM(units=lstm_size))(emb_current )
  #out_current = Reshape((1,2*lstm_size))(out_current)
  output = Dense(units=1, activation=  'sigmoid')(out_current)
  #output = Dense(units=1, activation='softmax')(out_current)
  model = Model(inputs=current_input, outputs=output)
  #-------------------------------compile-------------
  model.compile(optimizer='Adam', loss='binary_crossentropy', metrics=['accuracy'])
  return model
model = KerasClassifier(build_fn=create_model, epochs=1,batch_size=256, shuffle = True, verbose = 1,validation_split=0.2)
chain=ClassifierChain(model, order='random', random_state=42)
history=chain.fit(X_train, y_train)

the result for chain.classes_ is given below:

[array([0, 1], dtype=uint8),
 array([0, 1], dtype=uint8),
 array([0, 1], dtype=uint8),
 array([0, 1], dtype=uint8),
 array([0, 1], dtype=uint8),
 array([0, 1], dtype=uint8),
 array([0, 1], dtype=uint8),
 array([0, 1], dtype=uint8),
 array([0, 1], dtype=uint8),
 array([0, 1], dtype=uint8),
 array([0, 1], dtype=uint8),
 array([0, 1], dtype=uint8),
 array([0, 1], dtype=uint8),
 array([0, 1], dtype=uint8),
 array([0, 1], dtype=uint8),
 array([0, 1], dtype=uint8),
 array([0, 1], dtype=uint8)]

then trying to predict on Test data:

Y_pred_chain = chain.predict(X_test)

The summary of the model is given below: enter image description here

Full Trace of error is here:

109/109 [==============================] - 22s 202ms/step
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-28-34a25ad06cd4> in <module>()
----> 1 Y_pred_chain = chain.predict(X_test)

/usr/local/lib/python3.6/dist-packages/sklearn/multioutput.py in predict(self, X)
    523             else:
    524                 X_aug = np.hstack((X, previous_predictions))
--> 525             Y_pred_chain[:, chain_idx] = estimator.predict(X_aug)
    526 
    527         inv_order = np.empty_like(self.order_)

ValueError: could not broadcast input array from shape (27839,1) into shape (27839)

Can any one help about how to fix this error?


Solution

  • Stage 1

    Going by the model summary as posted in the question, I start with that the input size of 107 and the output size is 1 (binary classification task)

    Lets break it into pieces and understand.

    The Model architecture

    input_size = 107    
    # define the model
    def create_model():
      global input_size
      embedding_size=128
      lstm_size=64
      output_size=1
      vocab_size = 100
    
      current_input=Input(shape=(input_size,)) 
      emb_current = Embedding(vocab_size, embedding_size, input_length=input_size)(current_input)
      out_current=Bidirectional(LSTM(units=lstm_size))(emb_current )
      output = Dense(units=output_size, activation=  'sigmoid')(out_current)
      model = Model(inputs=current_input, outputs=output)
      model.compile(optimizer='Adam', loss='binary_crossentropy', metrics=['accuracy'])
      return model
    

    Some dummy data

    X = np.random.randint(0,100,(111, 107))
    y = np.random.randint(0,2,(111,1))  # NOTE: The y should have two dimensions
    

    Lets test the keras model directly

    model = KerasClassifier(build_fn=create_model, epochs=1, batch_size=8, shuffle = True, verbose = 1,validation_split=0.2)
    model.fit(X, y)
    y_hat = model.predict(X)
    

    Output:

    Train on 88 samples, validate on 23 samples
    Epoch 1/1
    88/88 [==============================] - 2s 21ms/step - loss: 0.6951 - accuracy: 0.4432 - val_loss: 0.6898 - val_accuracy: 0.5652
    111/111 [==============================] - 0s 2ms/step
    (111, 1)
    

    Ta-da! it works

    Now lets chain them and run

    model=KerasClassifier(build_fn=create_model, epochs=1, batch_size=8, shuffle=True, verbose=1,validation_split=0.2)
    chain=ClassifierChain(model, order='random', random_state=42)
    chain.fit(X, y)
    print (chain.predict(X).shape)
    

    oops! it trains but predictions fails as OP points out Error:

    ValueError: could not broadcast input array from shape (111,1) into shape (111)
    

    The problem

    This error is because of the below line in sklearn

    --> 525             Y_pred_chain[:, chain_idx] = estimator.predict(X_aug)
    

    It is because classifier chain runs the estimators one at a time and saves each estimators predictions in Y_pred_chain at the estimators index (determined by the order parameter). It assumes that the estimators return the predictions in a 1D array. But keras models return output of shape batch_size x output_size which in out our case is 111 x 1.

    The solution

    We need a way to reshape the predictions of shape 111 X 1 to 111 or in general batch_size x 1 to batch_size. Lets bank on the concepts of OOPS and overload the predict method of KerasClassifier

    class MyKerasClassifier(KerasClassifier):
      def __init__(self, **args):
        super().__init__(**args)
    
      def predict(self, X):
        return super().predict(X).reshape(len(X)) # Here we are flattening 2D array to 1D
    
    model=MyKerasClassifier(build_fn=create_model, epochs=1, batch_size=8, shuffle=True, verbose=1,validation_split=0.2)
    chain=ClassifierChain(model, order='random', random_state=42)
    chain.fit(X, y)
    print (chain.predict(X).shape)
    

    Output:

    Epoch 1/1
    88/88 [==============================] - 2s 19ms/step - loss: 0.6919 - accuracy: 0.5227 - val_loss: 0.6892 - val_accuracy: 0.5652
    111/111 [==============================] - 0s 3ms/step
    (111, 1)
    

    Ta-da! it works

    Stage 2

    Lets look deeper into ClassifierChain class

    A multi-label model that arranges binary classifiers into a chain.

    Each model makes a prediction in the order specified by the chain using all of the available features provided to the model plus the predictions of models that are earlier in the chain.

    So what we really need is a y of shape 111 X 17 so that the chain contains 17 estimators. Lets try it

    The real ClassifierChain

    y = np.random.randint(0,2,(111,17))
    model=MyKerasClassifier(build_fn=create_model, epochs=1, batch_size=8, shuffle=True, verbose=1,validation_split=0.2)
    chain=ClassifierChain(model, order='random', random_state=42)
    chain.fit(X, y)
    

    Output:

    ValueError: Error when checking input: expected input_62 to have shape (107,) but got array with shape (108,)
    

    It cannot train the model; the reason is pretty simple. The chain first trains the first estimator with 107 feature with works fine. Next the chain picks up the next estimator and then trains it with 107 features + the single output of the previous estimator (=108). But since our model has input size of 107 it will fail as so the error message. Each estimator will get 107 input features + the output of all the previous estimators.

    The solution [hacky]

    We need a way to change the input_size of the model as they are created from the ClassifierChain. There seem to be no callbacks or hooks into the ClassifierChain, so I have a hacky solution.

    input_size = 107    
    
    # define the model
    def create_model():
      global input_size
      embedding_size=128
      lstm_size=64
      output_size=1
      vocab_size = 100
    
      current_input=Input(shape=(input_size,)) 
      emb_current = Embedding(vocab_size, embedding_size, input_length=input_size)(current_input)
      out_current=Bidirectional(LSTM(units=lstm_size))(emb_current )
      output = Dense(units=output_size, activation=  'sigmoid')(out_current)
      model = Model(inputs=current_input, outputs=output)
      model.compile(optimizer='Adam', loss='binary_crossentropy', metrics=['accuracy'])
    
      input_size += 1 # <-- This does the magic
      return model
    
    X = np.random.randint(0,100,(111, 107))
    y = np.random.randint(0,2,(111,17))
    model=MyKerasClassifier(build_fn=create_model, epochs=1, batch_size=8, shuffle=True, verbose=1,validation_split=0.2)
    chain=ClassifierChain(model, order='random', random_state=42)
    chain.fit(X, y)
    print (chain.predict(X).shape)
    

    Output:

    Train on 88 samples, validate on 23 samples
    Epoch 1/1
    88/88 [==============================] - 2s 22ms/step - loss: 0.6901 - accuracy: 0.6023 - val_loss: 0.7002 - val_accuracy: 0.4783
    Train on 88 samples, validate on 23 samples
    Epoch 1/1
    88/88 [==============================] - 2s 22ms/step - loss: 0.6976 - accuracy: 0.5000 - val_loss: 0.7070 - val_accuracy: 0.3913
    Train on 88 samples, validate on 23 samples
    Epoch 1/1
    ----------- [Output truncated] ----------------
    111/111 [==============================] - 0s 3ms/step
    111/111 [==============================] - 0s 3ms/step
    (111, 17)
    

    As expected it trains 17 estimators and predict method returns output of shape 111 x 17 each column corresponding to the predictions made by the corresponding estimator.