Search code examples
feature-extractionfeature-detectionorb

ORB Feature Descriptor Official Paper Explanation


I was just reading the official paper of ORB from Ethan Rublee Official Paper and somewhat I find hard to understand the section of "4.3 Learning Good Binary Features"

I was surfing over the Internet to dig much deep into it and I found the below paragraph. I haven't getting the practical explanation of this. Can any of you explain me this in a simple terms.

"Given a local image patch in size of m × m, and suppose the local window (i.e., the box filter used in BRIEF) used for intensity test is of size r × r , there are N = (m − r )2 such local windows.

Each two of them can define an intensity test, so we have C2N bit features. In the original implementation of ORB, m is set to 31, generating 228,150 binary tests. After removing tests that overlap, we finally have a set of 205,590 candidate bit features. Based on a training set, ORB selects at most 256 bits according to Greedy algorithm."

What am getting from the official paper and from the above paragraph is that.

We have a patch size of 31X31 and select a size of 5X5.. We will have N=(31-5)^2 = 676 possible Sub Windows. Am not getting the lines which are marked in bold. What does it mean by removing test that overlap, we get 205,590 bit Features?


Solution

  • Imagine a small image with size 31x31 (patch) and a small 5x5 window. How many different positions this window can be placed into the image? If you slide it 1 by 1 pixel then it can be placed in (31-5)^2 = 676 different positions, right? Combining only central pixels of 676 windows by 2 elements you have 676!/(2!*(676-2)!) = 228,150 combinations. In case of ORB descriptor they were not interested in slide the window in 1 by 1 pixel, it could be so much noised because of overlap between some windows (they are much near). Then they removed overlapping windows sliding it 5 by 5 pixels and used their central pixels to create binary tests, what reduced total combinations to 205,590.