I don't understand why this enormously easy input-Output Problem can't be learned by the following ANN. I guess there is a mistake in my Code, but I don't find it.
X = np.array([[ 1., 1., -1., 1.],
[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.]])
Y = np.array([[ 1., 1., -5., 1.],
[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.]])
model = Sequential()
model.add(Dense(4, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(4, activation='linear'))
model.compile(loss='mse', optimizer='adam', metrics=['mae'])
erg = model.fit(X,Y, epochs=10, batch_size=1, verbose=1)
Like Matias has mentioned that you are using softmax activation in the last layer and it can't be used to produce the results in your dataset, because softmax activation is useful as activation in the output layer only when you dataset targets are some kind of probabilities. Using linear instead of softmax should help.
model.add(Dense(4, activation='linear'))