I want to use the Spatial Transformer with TensorFlow without the localization net, but use a given transformation matrix theta instead. I tried the transformer with the identity matrix which should not change the input. Unfortunately this isn't the case and if I repeat the transformation on the output multiple times the image is scaled smaller and smaller into the top left corner. I thought there would be some discretization error but why for the identity transformation?
in = tf.placeholder(tf.float32, [None, H, W, C], name="input_image")
theta = tf.placeholder(tf.float32, [None, 6], name="input_theta")
transformed = transformer(in, theta, [H, W])
Ok, I think I've found an answer:
The problem is the use of height and width normalized coordinates, such that -1 <= x,y <= 1
, more specifically the scaling back to matrix coordinates 0 <= x < width
and 0 <= y < height
which is done as follows in the _interpolate
method:
x = (x + 1.0)*(width_f) / 2.0
y = (y + 1.0)*(height_f) / 2.0
Changing the tf.linspace
functions in the _meshgrid
method to go from 0
to width-1
or heigth-1
and removing the mentioned back scaling solves my problem.