python-3.x machine-learning keras conv-neural-network mlp

Found input variables with inconsistent numbers of samples: [2, 8382]

my method has 2 input data model network

branches: 49The first branch is composed of an embedding followed by simple Multi-layer Perceptron (MLP) designed to handle input of the product description. The second branch is a CNN to operate over the product image data. These branches are then be concatenated together to form the final.

The problem is when we try to split the data with train_test_split by Cross Validation, It given as this error.

ValueError: Found input variables with inconsistent numbers of samples: [2, 8382]

MLP and CNN

def create_mlp(dim, regress=False):
    # define our MLP network
    model = Sequential()
    model.add(Dense(8, input_dim=dim, activation="relu"))
    model.add(Dense(4, activation="relu"))
    # check to see if the regression node should be added
    if regress:
        model.add(Dense(1, activation="linear"))
    # return our model
    return model

def create_cnn(width, height, depth, filters=(64, 32, 16), regress=False):
    # initialize the input shape and channel dimension, assuming
    # TensorFlow/channels-last ordering
    inputShape = (height, width, depth)
    chanDim = -1

    # define the model input
    inputs = Input(shape=inputShape)

    # loop over the number of filters
    for (i, f) in enumerate(filters):
        # if this is the first CONV layer then set the input
        # appropriately
        if i == 0:
            x = inputs

        # CONV => RELU => BN => POOL
        x = Conv2D(f, (3, 3), padding="same")(x)
        x = Activation("relu")(x)
        x = BatchNormalization(axis=chanDim)(x)
        x = MaxPooling2D(pool_size=(2, 2))(x)

    # flatten the volume, then FC => RELU => BN => DROPOUT
    x = Flatten()(x)
    x = Dense(16)(x)
    x = Activation("relu")(x)
    x = BatchNormalization(axis=chanDim)(x)
    x = Dropout(0.5)(x)

    # apply another FC layer, this one to match the number of nodes
    # coming out of the MLP
    x = Dense(4)(x)
    x = Activation("relu")(x)

    # check to see if the regression node should be added
    if regress:
        x = Dense(1, activation="linear")(x)

    # construct the CNN
    model = Model(inputs, x)

    # return the CNN
    return model

mlp = create_mlp(trainEmbedX.shape[1], regress=False)
cnn = create_cnn(64, 64, 3, regress=False)

combinedInput = concatenate([mlp.output, cnn.output])

x = Dense(4, activation="relu")(combinedInput)
x = Dense(1, activation="sigmoid")(x)
model = Model(inputs=[mlp.input, cnn.input], outputs=x)
model.compile(loss="binary_crossentropy", metrics=['accuracy'], optimizer="adam") # binary_crossentropy

The error occurs here

n_folds=3
epochs=3
batch_size=128

#save the model history in a list after fitting so that we can plot later
model_history = [] 
for i in range(n_folds):
    print("Training on Fold: ",i+1)
    t_x, val_x, t_y, val_y = train_test_split([trainEmbedX,trainImagesX], trainY, test_size = 0.2, random_state = np.random.randint(1,1000, 1)[0])
    model_history.append(fit_and_evaluate(t_x, val_x, t_y, val_y, epochs, batch_size))
    print("======="*12, end="\n\n\n")

Training on Fold:  1
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-84-651638774259> in <module>
      7 for i in range(n_folds):
      8     print("Training on Fold: ",i+1)
----> 9     t_x, val_x, t_y, val_y = train_test_split([trainEmbedX,trainImagesX], trainY, test_size = 0.2, random_state = np.random.randint(1,1000, 1)[0])
     10     model_history.append(fit_and_evaluate(t_x, val_x, t_y, val_y, epochs, batch_size))
     11     print("======="*12, end="\n\n\n")

~/anaconda3/envs/baron/lib/python3.6/site-packages/sklearn/model_selection/_split.py in train_test_split(*arrays, **options)
   2182         test_size = 0.25
   2183 
-> 2184     arrays = indexable(*arrays)
   2185 
   2186     if shuffle is False:

~/anaconda3/envs/baron/lib/python3.6/site-packages/sklearn/utils/validation.py in indexable(*iterables)
    258         else:
    259             result.append(np.array(X))
--> 260     check_consistent_length(*result)
    261     return result
    262 

~/anaconda3/envs/baron/lib/python3.6/site-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
    233     if len(uniques) > 1:
    234         raise ValueError("Found input variables with inconsistent numbers of"
--> 235                          " samples: %r" % [int(l) for l in lengths])
    236 
    237 

ValueError: Found input variables with inconsistent numbers of samples: [2, 8382]

Solution

This error happens with mismatching dimensions of X and Y in train_test_split.

By looking at your snippet, you try to concatenate two arrays by [trainEmbedX,trainImagesX] which will add a dimension if the original arrays trainEmbedX and trainImagesX are not 1D, hence you have the shape [2, 8382] in the error.

So instead of [trainEmbedX,trainImagesX], I suggest to use np.concatenate to merge these two arrays by np.concatenate((trainEmbedX,trainImagesX),axis=1).