Search code examples
luacomputer-visiondeep-learningtorch

How to interpret the output of deepmask getTopScores function


I am using facebookresearch's deepmask to segment an image. I am trying to modify the computeProposals.lua module to segment my own images. The key function is getTopProps. In the code block below it is being used to find the masks associated with the top proposals.

-- get top propsals
local masks,_ = infer:getTopProps(.2,h,w)

The infer class is from the InferDeepMask.lua module. getTopProps returns two things: the masks and the scores. The scores come from the Infer:getTopScores() function.

Question: How do I interpret the output of the getTopScores function?

The code comment:

-- each line contains: the score value, the scaleNb and position(of M(:))

Example output (converted into a numpy array):

[[  0.9942829    2.          26.           6.        ]
 [  0.9942829    3.          26.           6.        ]
 [  0.98620307   2.           1.          29.        ]
 [  0.98620307   3.           1.          29.        ]
 [  0.97150999   2.          19.           8.        ]
 [  0.97150999   3.          19.           8.        ]
 [  0.97141284   2.          18.           8.        ]
 [  0.97141284   3.          18.           8.        ]
 [  0.9639107    2.          15.          11.        ]
 [  0.9639107    3.          15.          11.        ]]

The first column is clearly a score out of one. What is scaleNb, and what are the positions of M(:)? They are not the pixel locations on the mask.

NB Each mask is of shape (336, 448)

Thanks!


Solution

  • By looking at the code:

    • the score is fairly self-explanatory

    • the scaleNb is the index of the adopted scale, i.e. the index k in the loop for k = 1,nScales do where the table containing the values of the different scales is created by the line for scale = -3,2,.25 do table.insert(self.scales,scale)

    • M(:) is the tricky part. If I understand the code correctly, pos is a tensor containing nScales ones, and the k-th element of the tensor gets incremented every time an object at scale k is selected (they are ordered by score). temp (which is the variable that is then used to compute the x and y values forming what they call M(:) is given by the line local temp=sortedIds[pos[scale]][scale]. What does that contain? It appears to be containing the position of the mask at the scale it has been detected. See the following piece of code:

      local sc=sc:view(h*w) local sS,sIds=torch.sort(sc,true) local sz = sS:size(1) sortedScores:narrow(2,s,1):narrow(1,1,sz):copy(sS) sortedIds:narrow(2,s,1):narrow(1,1,sz):copy(sIds)

    sortedIds appears to be containing the x,y position of the score that is currently being evaluated.