Is it possible to enforce mathematical constraints between tensorflow neural network output nodes?

Is it possible to enforce mathematical constraints between tensorflow neural network output nodes in the last layer?

For example, monotonicity between nodes, such as output node 1 being larger than node 2, which in turn is larger than node 3, and so forth.

Solution

In general -- not really, not directly, at least. Keras layers support arguments for constraints on the weights, and you may be able to translate a desired output constraint into a weight constraint instead -- but otherwise, you will need to think about how to set up the structure of your network such that the constraints are fulfilled.

Here is a sketch for how a monotonicity constraint might be possible. Actually including this in a model likely requires creating a custom Layer subclass or perhaps using the functional API.

First, let's create some dummy data. This could be the output of a standard Dense layer (4 is batch size, 5 the number of outputs).

raw_outputs = tf.random.normal([4, 5])
>>> <tf.Tensor: shape=(4, 5), dtype=float32, numpy=
array([[ 0.3989258 , -1.7693167 ,  0.13419539,  1.1059834 ,  0.3271042 ],
       [ 0.6493515 , -1.4397207 ,  0.05153034, -0.2730962 , -1.1569825 ],
       [-1.3043666 ,  0.20206456, -0.3841469 ,  1.8338723 ,  1.2728293 ],
       [-0.3725195 ,  1.1708363 , -0.01634515, -0.01382025,  1.2707714 ]],
      dtype=float32)>

Next, make all outputs be positive using softplus. Think of this as the output activation function. Any function that returns values >= 0 will do. For example, you could use tf.exp but the exponential growth might lead to numerical issues. I would not recommend relu since the hard 0s prevent gradients from flowing -- usually a bad idea in the output layer.

positive_outputs = tf.nn.softplus(raw_outputs)
>>> <tf.Tensor: shape=(4, 5), dtype=float32, numpy=
array([[0.9123723 , 0.15738781, 0.7624942 , 1.3918277 , 0.8700147 ],
       [1.0696293 , 0.21268418, 0.71924424, 0.56589293, 0.2734058 ],
       [0.24007489, 0.7992745 , 0.5194075 , 1.9821143 , 1.5197192 ],
       [0.5241344 , 1.4409455 , 0.685008  , 0.68626094, 1.5181118 ]],
      dtype=float32)>

Finally, use cumsum to add up the values:

constrained = tf.cumsum(positive_outputs, reverse=True, axis=-1)

>>> <tf.Tensor: shape=(4, 5), dtype=float32, numpy=
array([[4.0940967, 3.1817245, 3.0243368, 2.2618425, 0.8700147],
       [2.8408566, 1.7712271, 1.558543 , 0.8392987, 0.2734058],
       [5.0605907, 4.8205156, 4.021241 , 3.5018334, 1.5197192],
       [4.8544607, 4.3303266, 2.889381 , 2.204373 , 1.5181118]],
      dtype=float32)>

As we can see, the outputs for each batch element are monotonically decreasing! This is because each of our original outputs (positive_outputs) basically just encodes how much is added at each unit, and because we forced them to be positive, the numbers can only get larger (or smaller in this case because of reverse=True in cumsum).