Search code examples
pythontensorflowkerasdeep-learningneural-network

Calling model.fit on target array of columns vs array of rows with separate output nodes


The two should be equivalent in terms of how they are fed into the model or one should be accepted and the other incompatible. A list of columns vs a list of rows. My model architecture takes 5 inputs and spits out 3 outputs. I noticed that in training - when my target data was a 3 x 1000 numpy array I had losses in the region of 0.3 (which is unusually high considering when I predict on the test set my loss is 0.01). Often the loss at each output node is similar when using the standard columns and then differs greatly when using the rows.

Then transposing it y.T does not work - only when converting to a list list(y.T) does the model accept it as a target. Then the losses are much more comparable to what is expected. I know this should not work and passing in the columns should be the way to go - but why is my loss so much lower.

I know it sounds nonsensical but heres an MRE to validate whats happening:

    inputs = tf.keras.layers.Input(shape=(5,))
    x = tf.keras.layers.Dense(units=16, activation="relu")(inputs)
    x = tf.keras.layers.Dense(units=16, activation="relu")(x)
    out1 = tf.keras.layers.Dense(1)(x)
    out2 = tf.keras.layers.Dense(1)(x)
    out3 = tf.keras.layers.Dense(1)(x)

    outputs = [out1, out2, out3]
    model = tf.keras.models.Model(inputs=inputs, outputs=outputs)
    # Creating data
    data = np.random.random(size=(300,8))
    df = pd.DataFrame(data)
    df.columns = ["A", "B", "C","D", "E", "F", "G", "H"]
    df["F"] = df["F"] * 2 # Perturbing 
    df["G"] = df["G"] * 3
    df["H"] = df["H"] * 4   
    model.compile(loss="mae")
    X = df[["A", "B", "C", "D", "E"]]
    y = df[["F", "G", "H"]]
    ny = y.to_numpy()

    # history = model.fit(X, list(ny.T), epochs=200) # Alternate - uncomment when you want to use
    history = model.fit(X, ny, epochs=200) # standard
    pred = model.predict(X)
    pred = np.concatenate(pred, axis=1)
    resid = np.mean(np.abs(ny - pred), axis=0)
    print(resid) # Training loss is lower for rows...

Any help is appreciated. Maybe defining the model with separate output branches as opposed to a single Dense(3) changes things?


Solution

  • You need to pay attention to the output shape of your model. The way you have your output configured the shape is [(None, 1), (None, 1), (None, 1)], so it expects a list of values, each of size (None,1).If you use Dense(3) or tf.keras.layers.Concatenate()(outputs) then it has size (None, 3) and that is what it expects. you can inspect the shape of your output using model.output_shape.