Search code examples
pythontensorflowimage-processingmathematical-morphologyimage-morphology

Tensorflow dilation behave differently than morphological dilation


As the following piece of code shows, the tensorflow tf.nn.dilation2D function doesn't behave as a conventional dilation operator.

import tensorflow as tf
tf.InteractiveSession()
A = [[0, 0, 0, 0, 0, 0, 0],
     [0, 0, 0, 0, 1, 0, 0],
     [0, 0, 0, 1, 1, 1, 0],
     [0, 0, 0, 0, 1, 0, 0],
     [0, 0, 0, 0, 0, 0, 0],
     [0, 0, 0, 0, 0, 0, 0]]
kernel = tf.ones((3,3,1))
input4D = tf.cast(tf.expand_dims(tf.expand_dims(A, -1), 0), tf.float32)
output4D = tf.nn.dilation2d(input4D, filter=kernel, strides=(1,1,1,1), rates=(1,1,1,1), padding="SAME")
print(tf.cast(output4D[0,:,:,0], tf.int32).eval())

Returns the following tensor:

array([[1, 1, 1, 2, 2, 2, 1],
       [1, 1, 2, 2, 2, 2, 2],
       [1, 1, 2, 2, 2, 2, 2],
       [1, 1, 2, 2, 2, 2, 2],
       [1, 1, 1, 2, 2, 2, 1],
       [1, 1, 1, 1, 1, 1, 1]], dtype=int32)

I don't understand neither why it behaves like that, neither how I should use tf.nn.dilation2d to retrieve the expected output:

array([[0, 0, 0, 1, 1, 1, 0],
       [0, 0, 1, 1, 1, 1, 1],
       [0, 0, 1, 1, 1, 1, 1],
       [0, 0, 1, 1, 1, 1, 1],
       [0, 0, 0, 1, 1, 1, 0],
       [0, 0, 0, 0, 0, 0, 0]], dtype=int32)

Can someone enlighten the succinct documentation of tensorflow and give an explanation of what the the tf.nn.dilation2D function does ?


Solution

  • As mentioned in the documentation page linked,

    Computes the grayscale dilation of 4-D input and 3-D filter tensors.

    and

    In detail, the grayscale morphological 2-D dilation is the max-sum correlation [...]

    What this means is that the kernel's values are added to the image's values at each position, then the maximum value is taken as the output value.

    Compare this to correlation, replacing the multiplication with an addition, and the integral (or sum) with the maximum:

          convolution: g(t) = ∫ f(𝜏) h(𝜏-t) d𝜏

          dilation: g(t) = max𝜏 { f(𝜏) + h(𝜏-t) }

    Or in the discrete world:

          convolution: g[n] = ∑k f[k] h[k-n]

          dilation: g[n] = maxk { f[k] + h[k-n] }


    The dilation with a binary structuring element (kernel, what the question refers to as a “conventional dilation”) uses a structuring element (kernel) that contains only 1s and 0s. These indicate “included” and “excluded”. That is, the 1s determine the domain of the structuring element.

    To recreate the same behavior with a grey-value dilation, set the “included” pixels to 0 and the “excluded” pixels to minus infinity.

    For example, the 3x3 square structuring element used in the question should be a 3x3 matrix of zeros.