Search code examples
pythontensorflowkerassoftmaxactivation-function

Is there a simple way to extend an existing activation function? My custom softmax function returns: An operation has `None` for gradient


I want to implement an attempt to make softmax faster by using only the top k values in the vector.

For that I tried implementing a custom function for tensorflow to use in a model:

def softmax_top_k(logits, k=10):
    values, indices = tf.nn.top_k(logits, k, sorted=False)
    softmax = tf.nn.softmax(values)
    logits_shape = tf.shape(logits)
    return_value = tf.sparse_to_dense(indices, logits_shape, softmax)
    return_value = tf.convert_to_tensor(return_value, dtype=logits.dtype, name=logits.name)
    return return_value

I'm using the fashion mnist to test, whether that attempt is working:

fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

# normalize the data
train_images = train_images / 255.0
test_images = test_images / 255.0

# split the training data into train and validate arrays (will be used later)
train_images, train_images_validate, train_labels, train_labels_validate = train_test_split(
    train_images, train_labels, test_size=0.2, random_state=133742,
)

model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation=tf.nn.relu),
    keras.layers.Dense(10, activation=softmax_top_k)
])


model.compile(
    loss='sparse_categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

model.fit(
    train_images, train_labels,
    epochs=10,
    validation_data=(train_images_validate, train_labels_validate),
)

model_without_cnn.compile(
    loss='sparse_categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

model_without_cnn.fit(
    train_images, train_labels,
    epochs=10,
    validation_data=(train_images_validate, train_labels_validate),
)

But during the execution an error is occuring:

ValueError: An operation hasNonefor gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable).

I've found this: (How to make a custom activation function), which explaines how to implement a completly custom activation function to tensorflow. But since this uses and expands softmax, I thought that the gradient should still be the same.

This is my first week of coding with python and tensorflow, therefore I don't have a good overview over all the internal implementations, yet.

Is there a simpler way to extend softmax into a new function, rather than implementing it from scratch?

Thanks in advance!


Solution

  • Instead of using sparse tensors to make the tensor with "all zeros except softmaxed top-K values", use tf.scatter_nd:

    import tensorflow as tf
    
    def softmax_top_k(logits, k=10):
        values, indices = tf.nn.top_k(logits, k, sorted=False)
        softmax = tf.nn.softmax(values)
        logits_shape = tf.shape(logits)
        # Assuming that logits is 2D
        rows = tf.tile(tf.expand_dims(tf.range(logits_shape[0]), 1), [1, k])
        scatter_idx = tf.stack([rows, indices], axis=-1)
        return tf.scatter_nd(scatter_idx, softmax, logits_shape)
    

    EDIT: Here is a slightly more complex version for tensors with an arbitrary number of dimensions. The code still requires that the number of dimensions is known at graph construction time, though.

    import tensorflow as tf
    
    def softmax_top_k(logits, k=10):
        values, indices = tf.nn.top_k(logits, k, sorted=False)
        softmax = tf.nn.softmax(values)
        # Make nd indices
        logits_shape = tf.shape(logits)
        dims = [tf.range(logits_shape[i]) for i in range(logits_shape.shape.num_elements() - 1)]
        grid = tf.meshgrid(*dims, tf.range(k), indexing='ij')
        scatter_idx = tf.stack(grid[:-1] + [indices], axis=-1)
        return tf.scatter_nd(scatter_idx, softmax, logits_shape)