why predict with a tensorflow model don't give the same answer for each signal separatly and all signal at once?

Have created a tenforflow model that is taking 512 input samples (1 * N * 512) and I would like to make a prediction with new input.

I have an s variable that has 19*512 signal if i predict the output of my model with one signal at a time

[DLmodel( s[i,:][np.newaxis,np.newaxis,:] ).numpy()[0,:,0] for i in range(19)]

I got this answer:

[[0.41768566 0.5564939  0.30202574 0.35190994 0.27736259 0.28247398    0.2699227  0.33878434 0.35135144 0.31779674 0.3259031  0.3272484    0.32065392 0.33836302 0.31446803 0.26727855 0.29702038 0.30528304    0.32032394]]

but if I predict directly with a 2D matrix (all signals) I get :

DLmodel( s[np.newaxis,:,:] ).numpy()[0,:,0] 

[4.1768566e-01 3.5780075e-01 1.5305097e-01 9.7242827e-03 8.3400400e-06    2.6045337e-09 2.0279233e-11 1.0051511e-12 4.4332330e-13 2.3794513e-13    2.0760676e-13 1.8587506e-13 1.7166681e-13 1.7180506e-13 1.7025846e-13    1.5340669e-13 1.8261155e-13 1.4610023e-13 1.4570285e-13]

I don't understand why the answers are not equal?

I don't understand also why if i make a 2d matrix input with a sliding window of 5 signals with 1 sample shift, I don't get the correct answer:

Signals = []
k=0
for i in range(int(437*Fs),int(437*Fs)+5):
    Signals.append(Sigs[10,(k+i):(k+i)+size])
Signals = np.array(Signals)
Signals = np.expand_dims(Signals, axis=[0])
print(DLmodel(Signals).numpy()[0,:,0])
Signals = []
k=0
for i in range(int(437*Fs),int(437*Fs)+5):
    Signals.append(Sigs[10,(k+i+1):(k+i+1)+size])
Signals = np.array(Signals)
Signals = np.expand_dims(Signals, axis=[0])
print(DLmodel(Signals).numpy()[0,:,0])

print this :

[0.9198115  0.98681784 0.997053   0.9992207  0.9997619 ]
           [0.92536646 0.9863089  0.99667054 0.99903715 0.999721  ]

I tab the second line so both up and down number should be the same.This is very confusing.

Here's the model I used:

DLmodel = Sequential()
DLmodel.add(LSTM(units=size, return_sequences=True, input_shape=(None, size),
             activation='tanh'))  # , kernel_regularizer=L2(0.01)))
DLmodel.add(Dropout(0.3))
DLmodel.add(Dense(size // 2, activation="relu", kernel_initializer="uniform"))
DLmodel.add(Dropout(0.3))
DLmodel.add(Dense(size // 4, activation="relu", kernel_initializer="uniform"))
DLmodel.add(Dropout(0.3))
DLmodel.add(Dense(size // 8, activation="relu", kernel_initializer="uniform"))
DLmodel.add(Dropout(0.3))
DLmodel.add(Dense(1, activation="sigmoid", kernel_initializer="uniform"))

DLmodel.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy', 'mse'], run_eagerly=True)

Solution

You use an LSTM layer and this layer has an internal memory, that is initialized with an all-zero vector for every new input. The LSTM layer is used for (time-)series data, for example the traffic at a certain hour of the day, or the sequence of words within a sentence. Usually you input one or several samples into an LSTM layer, so that your data has the following shape: (num_samples, num_steps, num_features) The num_steps refers to the number of "sub-samples" within your data. For example: You have the sentence "I am at home", your first input will be a numeric representation of the word "I", the second step a numeric representation of the word "am". The data could also be 12 cars driving at a street between 5:00 and 6:00 am (time step 0) and 48 cars driving between 6:00 and 7:00 am (time step 1). For every time step passed into this layer, the memory is adjusted. In consequence: If you plug in time step 1 without having inserted time step 0 the result will be different from inserting time step 0 and time step 1 after another.

And that is the entry point for your problem: In the upper part of your example you insert your data separated and the internal memory is set to zero in all cases. But in the second case, you insert the data as time series, therefore the first result is the same (it is calculated based on the all-zero memory), but all later results use the memory of the time steps before. So you actually produced two different cases. Unfortunately, you did not provide details about your data, but one of the cases is the "right" one, while the other one will not work as you intend.

The two cases:

Your first case:

Start memory = [0, 0, 0, ...]
input 10 samples shape = (1, 1, 512) # 1 sample, 1 time steps, 512 features
insert 1st sample -> calculate output based on all-zero memory
insert 2nd sample -> calculate output based on all-zero memory
and so on

Second case:

Start memory = [0, 0, 0, ...]
input shape = (1, 10, 512) # 1 sample, 10 time steps, 512 features
insert whole input, calculation based on multiple sub steps:

calculate 1st output, adjust memory
calculate 2nd output based on adjusted memory, adjust memory again
and so on