Some approaches I have considered:
Inheriting from Model class Sampled softmax in tensorflow keras
Inheriting from Layers class How can I use TensorFlow's sampled softmax loss function in a Keras model?
Of the two approaches the Model approach is cleaner, as the layers approach is a little hacky - it pushes in the target as part of the input and then bye bye multi-output models.
I'd like some help in subclassing the Model class - Specifically: 1) Unlike the first approach - I would like to take in any number of layers as we do in specifying a standard keras model. For example,
class LanguageModel(tf.keras.Model):
def __init__(self, **kwargs)
2)I am looking to incorporate within the model class the below code -but want to let the Model class recognize that
def call(self, y_true, input):
""" reshaping of y_true and input to make them fit each other """
input = tf.reshape(input, (-1,self.hidden_size))
y_true = tf.reshape(y_true, (-1,1))
weights = tf.Variable(tf.float64))
biases = tf.Variable(tf.float64)
loss = tf.nn.sampled_softmax_loss(
weights=weights,
biases=biases,
labels=labels,
inputs=inputs,
...,
partition_strategy="div")
logits = tf.matmul(inputs, tf.transpose(weights))
logits = tf.nn.bias_add(logits, biases)
y_predis = tf.nn.softmax_cross_entropy_with_logits_v2(
labels=inputs[1],
logits=logits)
3 I guess i need some pointers to which sections of the Model class in the functional API should I mess with -knowing I have to write a custom loss function like above. I guess the issue is accessing the weights in the tf.nn.sampledsoftmax function
The simplest approach I can come up with is to define a loss that ignores the result of the output layer.
Full Colab here: https://colab.research.google.com/drive/1Rp3EUWnBE1eCcaisUju9TwSTswQfZOkS
The loss function. Note that it assumes that the output layer is a Dense(activation='softmax') and it ignores y_pred
. Thus during training / eval where the loss is used the actual output of the Dense layer is a NOP.
The output layer is used when doing predictions.
class SampledSoftmaxLoss(object):
""" The loss function implements the Dense layer matmul and activation
when in training mode.
"""
def __init__(self, model):
self.model = model
output_layer = model.layers[-1]
self.input = output_layer.input
self.weights = output_layer.weights
def loss(self, y_true, y_pred, **kwargs):
labels = tf.argmax(y_true, axis=1)
labels = tf.expand_dims(labels, -1)
loss = tf.nn.sampled_softmax_loss(
weights=self.weights[0],
biases=self.weights[1],
labels=labels,
inputs=self.input,
num_sampled = 3,
num_classes = 4,
partition_strategy = "div",
)
return loss
Model:
def make_model():
inp = Input(shape=(10,))
h1 = Dense(16, activation='relu')(inp)
h2 = Dense(4, activation='linear')(h1)
# output layer and last hidden layer must have the same dims
out = Dense(4, activation='softmax')(h2)
model = Model(inp, out)
loss_calculator = SampledSoftmaxLoss(model)
model.compile('adam', loss_calculator.loss)
return model
tf.set_random_seed(42)
model = make_model()
model.summary()
Note that the SampledSoftmaxLoss imposes that the inputs of the last model Layer must have the same dimensions as the number of classes.