Implementing a Keras Layer that takes inputs of different shape for a Region Proposal Network (functional API)

Writing an implementation of a region proposal network based on Faster R-CNN in the Keras functional API, I am coming across an issue for which I haven't found a clear solution after some searching.

I have a custom layer, call it the Roi_Projection_Layer, which will be a custom layer in Keras. This layer should take:

a convolutional feature map from an image of shape = (None, 32, 19, 512) (first dimension is the batch size), as well as
an anchor box of shape=(None, 1, 4) for example sample_anchor_box = [x_centre, y_centre, box_width, box_height]

I wish to pass both these tensors, which are clearly of different shape, to a Keras layer so I can use the centre and shape of the anchor box as projection parameters, a.k.a get a specific 3 by 3 window in the spatial dimension of the feature map, to be passed on to more layers in the model.

I am not sure how to do this. Some ideas I've had is to append the anchor box values to the spatial dimension for each channel, i.e. pass in an feature map of spatial dimension (32*19+4) but what I am unsure about is that if you modify the inputs outside strict Keras layers operations, will the model lack this code once it is being compiled? Any insights appreciated.

Solution

Since I didn't get an answer for this, I will post my attempt/investigation.

I was able to code an ROI Projection Layer by subclassing keras.layers.Layer where the input is a list of a single tuple. The first element in the list is a tuple, i.e the first element of the tuple a single image, and the second element is the set of anchor boxes of the form [x_min, y_min, width, height]. I ended up padding the result with zeros since the next convolutional layers in Faster R-CNN take a 3 by 3 window as input, and so for anchor boxes that map to the boundaries pixels of the feature maps, we need padding.

class RoiProjectionLayer(keras.layers.Layer):
    def __init__(self, stride):
        super(RoiProjectionLayer, self).__init__()
        self.stride = stride
    def call(self, inputs):
        projected_feature_maps = []
        batch_size = inputs[0][1].shape[0]
        for i in range(batch_size):
            # x centre (after padding) of the anchor box location in the feature map
            x_val = tf.dtypes.cast(inputs[0][1][i][0]/self.stride, tf.int32)+2
            # y centre (after padding) of the anchor box location in the feature map
            y_val = tf.dtypes.cast(inputs[0][1][i][1]/self.stride, tf.int32)+2 
            feature_map = inputs[0][0][0]
            padding_values = tf.constant([[2, 2], [2, 2], [0, 0]])
            feature_map = tf.pad(feature_map, padding_values, "CONSTANT")
            projected_feature_maps.append(feature_map[x_val-1:x_val+2, y_val-1:y_val+2, :])
        return tf.stack([x for x in projected_feature_maps])

Basically the key was to note that layers in keras can take lists of tuples of tensors. This worked even in non-eager execution, however I had to set the batch_size as a fixed parameter of the RoiProjectionLayer class.