Disclamer: I know nothing about CNN and deep learning and I don't know Torch.
I'm using SIFT for my object recognition application. I found this paper Discriminative Learning of Deep Convolutional Feature Point Descriptors which is particularly interesting because it's CNN based, which are more precise than classic image descripion methods (e.g. SIFT, SURF etc.), but (quoting the abstract):
using the L2 distance during both training and testing we develop 128-D descriptors whose euclidean distances reflect patch similarity, and which can be used as a drop-in replacement for any task involving SIFT
Wow, that's fantastic: that means that we can continue to use any SIFT based approach but with more precise descriptors!
However, quoting the github code repository README:
Note the output will be a Nx128 2D float tensor where each row is a descriptor.
Well, what is a "2D float tensor"? SIFT descriptors matrix is Nx128 floats, is there something that I am missing?
2D float tensor = 2D float matrix.