image pytorch lstm pixel affinetransform

Align feature map with ego motion (problem of zooming ratio )

I want to align the feature map using ego motion, as mentioned in the paper An LSTM Approach to Temporal 3D Object Detection in LiDAR Point Clouds

I use VoxelNet as backbone, which will shrink the image for 8 times. The size of my voxel is 0.1m x 0.1m x 0.2m(height)

So given an input bird-eye-view image size of 1408 x 1024,

the extracted feature map size would be 176 x 128, shrunk for 8 times.

The ego translation of the car between the "images"(point clouds actually) is 1 meter in both x and y direction. Am I right to adjust the feature map for 1.25 pixels?

1m/0.1m = 10  # meter to pixel
10/8 = 1.25   # shrink ratio of the network

However, though experiments, I found the feature maps align better if I adjust the feature map with only 1/32 pixel for the 1 meter translation in real world.

Ps. I am using the function torch.nn.functional.affine_grid to perform the translation, which takes a 2x3 affine matrix as input.

Solution

It's caused by the function torch.nn.functional.affine_grid I used.

I didn't fully understand this function before I use it.

These vivid images would be very helpful on showing what this function actually do(with comparison to the affine transformations in Numpy.