please help me to understand this idea from a paper which titled is "Scene Summarization for Online Image Collections" by Ian Simon Noah Snavely Steven M. Seitz, University of Washington.
Computing the Feature-Image Matrix :
We first transform the set of views into a feature-image
incidence matrix. To do so, we use the SIFT keypoint detector
to find feature points in all of the images in V. The
feature points are represented using the SIFT descriptor.
Then, for each pair of images, we perform feature matching
on the descriptors to extract a set of candidate matches.
We further prune the set of candidates by estimating a fundamental
matrix using RANSAC and removing all inconsistent
matches After the previous step is complete
for all images,
we organize the matches into tracks,
where a track is a connected component of features. We remove
tracks containing fewer than two features total, or at
least two features in the same image. At this point, we consider
each track as corresponding to a single 3D point in S.
From the set of tracks, it is easy to construct the |S|-by-|V|
feature-image incidence matrix.
the part which i confused about is the italic one.
how we organize matches into tracks ?
and how to construct feature-image incidence matrix ?
pls help me. . .
Example for 3 images track.
Detect features
Perform matching (1 - 2, 2 - 3). Now you have correspondences FeatureA_img1 = FeatureB_img2, FeatureC_img2 = FeatureD_img3, FeatureE_img1 = FeatureF_img3.
Check, if FeatureA_img1 == FeatureB_img2 AND FeatureB_img2 == FeatureC_img3, than you have the same feature in 3 images. Save it in the array:
img1 img2 img3 ... imgn FeatureA FeatureB FeatureC ...
Repeat this for all correspondences. The rows in this table are the tracks you are looking for.