Search code examples
rtensorflowkerasneural-network

Custom Weight Regularization in Keras


I am attempting to implement a custom regularization method in Keras for R which will discourage negative weightings during training. I have found supporting documentation for this in Python, just not for R.

In this method, I would like to identify negative weightings, and then apply regularization to those weights specifically. I have my current attempt defined as

l1l2_reg <- function(weight_matrix) {
    neg <- which(weight_matrix < 0, arr.ind = T)
    return(0.0001 * sum(sum(weight_matrix[neg]^2)) + sum(sum(abs(weight_matrix[neg]^2))))
}

I am defining the usage of this within my model as

  reconstruct = bottleneck %>% 
    layer_dense(units = input_size, activation = "linear",
            kernel_regularizer = l1l2_reg,
            name = "reconstruct")

When the model is run, I am met with the error message

Error: Discrete value supplied to continuous scale

I believe that this is occurring because the function is not correctly locating the weights, but am unsure how to go about it. Based on the code above, it should be identifying the indices of the negative weightings and then returning the regularization based off of that, but clearly my implementation is flawed. I primarily use MATLAB, so my implementation may also be skewed towards that as well.

What is the correct method of implementing this within R?


Solution

  • For most custom functions passed to Keras (in both Python and R), you generally have to stick to TensorFlow operations. In this case, which() and subsetting with an integer array via [neg] need to be updated to their TensorFlow equivalents: tf$where() and tf$gather_nd(). Or you can take a different approach altogether and use tf$maximum(), like in the example below.

    (The [ method for tensors today doesn't yet accept a list of arbitrary integer indices, but rather, slice specs, in R see ?`[.tensorflow.tensor` for details)

    (sum(), abs(), ^, * are R generics which automatically dispatch to the TensorFlow methods tf$reduce_sum(), tf$abs(), tf$pow() and tf$multiply() when called with a Tensor)

    You can update your l1l2_reg like this (note, the actual calculation is slightly different from what you wrote, to match the common meaning of "l1" and "l2"):

    library(tensorflow)
    library(keras)
    
    neg_l1l2_reg <- function(weight_matrix) {
      x <- tf$maximum(tf$zeros_like(weight_matrix), weight_matrix)
      l1 <- sum(abs(x)) * 0.0001
      l2 <- sum(x ^ 2) * 0.0001
      l1 + l2
    }