I am trying to reverse the output of LRP as heatmap back to model's weights. I have decieded to minimize the loss of the between the relevence of the untrained model weights and the desired heatmap relevence score so in theory it should make the weights reach the values that produce the desired heatmap and I am doing this via GraidentTape. I am following this tutorial implementation of simple LRP for a fully connected model for minst dataset here is the model diagram
And this is the model implementation
num_classes = 10
input_layer = Input(shape=(img_width * img_height,))
X = Dense(300, activation='relu', kernel_regularizer='l2')(input_layer)
X = Dense(100, activation='relu',kernel_regularizer='l2')(X)
X = Dense(10, activation='relu',kernel_regularizer='l2')(X)
model = Model(inputs=input_layer, outputs=X)
And this is the LRP function to get revelence where arguments are
and returns R the revelence of each neuron in each layer
def get_relevance_tf(W,B,img,pred):
L = len(W)
A = [img]+[None]*L
for l in range(L):
A[l+1] = tf.nn.relu(tf.matmul(A[l],W[l])+B[l])
R = [0.0]*L + [A[L]*(pred)]
for l in range(1,L)[::-1]:
w = W[l]
b = B[l]
z = tf.matmul(A[l],w)+b # step 1
s = R[l+1] / z # step 2
c = tf.matmul(s,w,transpose_b=True) # step 3
R[l] = A[l]*c # step 4
w = W[0]
wp = tf.math.maximum(0,w)
wm = tf.math.minimum(0,w)
lb = A[0]*0-1
hb = A[0]*0+1
z = tf.matmul(A[0],w)-tf.matmul(lb,wp)-tf.matmul(hb,wm)+1e-9 # step 1
s = R[1]/z # step 2
c,cp,cm = tf.matmul(s,w,transpose_b=True),tf.matmul(s,wp,transpose_b=True),tf.matmul(s,wm,transpose_b=True) # step 3
R[0] = A[0]*c-lb*cp-hb*cm # step 4
return R
And this is gradient tape function where pred_R is the relevence score of the desired heatmap and model.10.hdf5 is the untrained model
model = tf.keras.models.load_model("model.10.hdf5")
img = tf.convert_to_tensor(X_train[index].reshape(1,784),dtype=tf.float32)
pred = tf.convert_to_tensor(y_train_one_hot[index],dtype=tf.float32)
W = [tf.Variable(i,dtype=tf.float32,trainable=True) for i in model.get_weights()[::2]]
B = [tf.Variable(i,dtype=tf.float32,trainable=True) for i in model.get_weights()[1::2]]
with tf.GradientTape() as tape:
R = get_relevance_tf(W,B,img,pred)
loss = tf.math.reduce_sum(tf.math.abs(R[0]-pred_R[0]))
grads = tape.gradient(loss, [W,B])
print(grads)
This is the output as you can see all the grads are zeros
[[<tf.Tensor: shape=(784, 300), dtype=float32, numpy=
array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]], dtype=float32)>, <tf.Tensor: shape=(300, 100), dtype=float32, numpy=
array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]], dtype=float32)>, <tf.Tensor: shape=(100, 10), dtype=float32, numpy=
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
...
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
dtype=float32)>, <tf.Tensor: shape=(10,), dtype=float32, numpy=array([-0., -0., -0., -0., -0., -0., -0., -0., -0., -0.], dtype=float32)>]]
I can't understand why the gradients are zeros while there is direct correlation between the weights and the relevence
Things I have tried
I have solved this issue apparently the untrained model's weights were local minima in the loss function thus giving 0 for all weights
Note to future me, start with randomized values