Implement Custom Activity Regularizer inside Keras

I am trying to implement a regularization term inside loss function of Andrew Ng Sparse Autoencoder. On page 15, a sparsity penalty term introduced which calculated from sum over Kullback-Leibor (KL) divergence between rho and rho_hat_j of all hidden layer units. rho is static number which force neurons to be mostly off and rho_hat_j is average output (activation) of a neuron j on all over training set.

I'm using Keras to implement the autoencoder, I know a great tutorial on building autoencoders with Keras is available on Keras Blog, but I want to implement described sparsity penalty term with custom regularizer in Keras. Some old implementation about this question found on Link, Link, but as changes on regularization API on Keras since version 1.2.0, they are already deprecated and do not work any more.

So I'm trying to implement it with a something like this:

from keras import backend as K

def kl_divergence(rho, rho_hat):
    return rho * tf.log(rho) - rho * tf.log(rho_hat) + (1 - rho) * tf.log(1 - rho) - (1 - rho) * tf.log(1 - rho_hat)

class SparseActivityRegularizer(Regularizer):

    def __init__(self, p=0.1, sparsityBeta=3):
        self.p = p
        self.sparsityBeta = sparsityBeta

    def __call__(self, x):
        regularization = 0            

        p_hat = K.mean(x, axis=0)
        regularization += self.sparsityBeta * K.sum(kl_divergence(self.p, p_hat))

        return regularization

    def get_config(self):
        return {"name": self.__class__.__name__}

Is it correct?!

A BIG question that I did not found anywhere, What exactly pass to callable __ call __ (as x parameter)?

Am I correct that x is 2 dimensional tensor which each rows belong to each neuron and each column belong to each sample on training set, and each cell(i,j) will be output of neuron i for sample j of training set?

Update: Shorter Question

Consider a 3 layer autoencoder in Keras, How should implement this overall cost function?

beta: Sparsity penalty coefficient (e.g. 3)

s_2: Number of units in hidden layer

rho: Fixed value (e.g. 0.2)

m: Number of samples in training set

x_i: i'th sample of training set

a_2_j(x_i): Output of j'th unit of layer 2 for i'th sample of training set

Solution

Your code is right. But it seems that there is not any code about your autoencoder model. It is just the regularizer of the hidden layer.

Since you were defining a activity regularizer, the X in the __call__ function is the activity (the output of the hidden layer), of which the shape should be (?, hidden_dim). "?" means the number of samples and is not known before fitting. hidden_dim is the number of neuron in the hidden layer, in my example below it should be 250.

If you want to build the whole, you should know how to define the other layers. Here is a toy example.

x_input = Input(shape=(576,))
regularizer = SparseActivityRegularizer(0.1, 6)
encoded = Dense(250, activation='relu', activity_regularizer=regularizer)(x_input)
decoded = Dense(576, activation ='relu')(encoded)
ae = Model(inputs=x_input, outputs=decoded)

Then you can compile and fit the model via:

ae.compile(optimizer='adam', loss='mse')
ae.fit(x_train, x_train, epochs=1, batch_size=50)

So the overall loss function consists of two parts: 1) The mse assigned when you compiling your model and 2) the regularization of activity when you defining the hidden layer (encoded in my example)