Regularization function using weights from multiple layers?

I don't know if it is feasible but I'm asking just in case. Here is the (simplified) architecture of my model.

Layer (type)           Output Shape        Param  #Connected to
==========================================
input_1 (InputLayer)   [(None, 7, 7, 1024)  0
conv (Conv2D)          (None, 7, 7, 10)     10240 input_1[0][0]

where each of the 10 filters in "conv" is a 1x1x1024 convolutional filter (with no bias but it's irrelevant for this particular issue). I am currently using a custom regularization function on "conv" to make sure that the (1x1)x1024x10 matrix of filter weights has a nice property (basically that all vectors are pairwise orthogonal) and so far, everything is working as expected. Now I also want the ability to disable training on some of these 10 filters. The only way I know how to do that would be to implement 10 filters independently as follows

Layer (type)                    Output Shape         Param #     Connected to                     
=========================================================
input_1 (InputLayer)            [(None, 7, 7, 1024) 0
conv_1 (Conv2D)          (None, 7, 7, 1)     1024       input_1[0][0]
conv_2 (Conv2D)          (None, 7, 7, 1)     1024       input_1[0][0]
conv_3 (Conv2D)          (None, 7, 7, 1)     1024       input_1[0][0]
...
conv_10 (Conv2D)          (None, 7, 7, 1)     1024       input_1[0][0]

followed by a Concatenate layer, then to set the "trainable" parameter to True/False on each conv_i layer as I see fit. However, now I don't know how to implement my regularization function which must be computed on the weights of all layers conv_i simultaneously rather than independently. Is there a trick that I can use to implement such function? Or conversely, is there a way to freeze only part of the weights of a convolutional layer? Thanks!

Solution

For those interested, here is the working code for my problem following the advice provided by @LaplaceRicky.

class SpecialRegularization(tf.keras.Model):
   """ In order to avoid a warning message when saving the model, 
   I use the solution indicated here 
   https://github.com/tensorflow/tensorflow/issues/44541
   and now inherit from tf.keras.Model instead of Layer
   """
    def __init__(self,nfilters,**kwargs):
        super().__init__(**kwargs)
        self.inner_layers=[Conv2D(1,(1,1)) for _ in range(nfilters)]

    def call(self, inputs):
        outputs=[l(inputs) for l in self.inner_layers]
        self.add_loss(self.define_your_regularization_here())
        return tf.concat(outputs,-1)

    def set_trainable_parts(self, trainables):
        """ Set the trainable attribute independently on each filter """
        for l,t in zip(self.inner_layers,trainables):
            l.trainable = t

    def define_your_regularization_here(self):
        #reconstruct the original kernel
        large_kernel=tf.concat([l.kernel for l in self.inner_layers],-1)
        return tf.reduce_sum(large_kernel*large_kernel[:,:,:,::-1])

Solution

One way to achieve this is to have a custom keras layer that wraps all of the small conv layers and is responsible for computing the regularization loss.

Example Codes:

import tensorflow as tf

def _get_losses(model,x):
    model(x)
    return model.losses

def _get_grads(model,x):
  with tf.GradientTape() as t:
    model(x)
    reg_loss=tf.math.add_n(model.losses)
  return t.gradient(reg_loss,model.trainable_weights)

class SpecialRegularization(tf.keras.layers.Layer):
    def __init__(self, **kwargs):
        self.inner_layers=[tf.keras.layers.Conv2D(1,(1,1)) for i in range(10)]
        super().__init__(**kwargs)

    def call(self, inputs,training=None):
        outputs=[l(inputs,training=training) for l in self.inner_layers]
        self.add_loss(self.define_your_regularization_here())
        return tf.concat(outputs,-1)

    def define_your_regularization_here(self):
      #reconstruct the original kernel
      large_kernel=tf.concat([l.kernel for l in self.inner_layers],-1)
      #just giving an example here
      #you should define your own regularization using the entire kernel
      return tf.reduce_sum(large_kernel*large_kernel[:,:,:,::-1])

tf.random.set_seed(123)
inputs = tf.keras.Input(shape=(7,7,1024))
outputs = SpecialRegularization()(inputs)
model = tf.keras.Model(inputs=inputs, outputs=outputs)

#get_losses, get_grads are for demonstration purpose
get_losses=tf.function(_get_losses)
get_grads=tf.function(_get_grads)
data=tf.random.normal((64,7,7,1024))
print(get_losses(model,data))
print(get_grads(model,data)[0])
print(model.layers[1].inner_layers[-1].kernel*2)
model.summary()
'''
[<tf.Tensor: shape=(), dtype=float32, numpy=-0.20446025>]
tf.Tensor(
[[[[ 0.02072023]
   [ 0.12973154]
   [ 0.11631528]
   ...
   [ 0.00804012]
   [-0.07299817]
   [ 0.06031524]]]], shape=(1, 1, 1024, 1), dtype=float32)
tf.Tensor(
[[[[ 0.02072023]
   [ 0.12973154]
   [ 0.11631528]
   ...
   [ 0.00804012]
   [-0.07299817]
   [ 0.06031524]]]], shape=(1, 1, 1024, 1), dtype=float32)
Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 7, 7, 1024)]      0         
_________________________________________________________________
special_regularization (Spec (None, 7, 7, 10)          10250     
=================================================================
Total params: 10,250
Trainable params: 10,250
Non-trainable params: 0
_________________________________________________________________
'''