deep-learning object-detection feature-extraction video-tracking

Does the Object Detector in DeepSORT object tracking framework run over each and every frame of a video?

I am trying to track objects using the DeepSORT algorithm described in this paper. What I have understood is that, the there are two deep-learning models at work here. One is the object detector (maybe YoLo etc) and the other is a feature extractor. The object detector tries to detect the presence of the object in a frame, while the feature extractor helps to identify if the current detected object has already been detected previously and if so, it assigns the detected object to the corresponding track.

However, one thing I fail to understand is that when does the Object Detector run? Yes, it should run on the first frame, but after that, does it run only after every nth frames? OR does it run on each frame, but only on the apporximate location predicted by the tracker.

Thanks.

Solution

The object detector (or YOLO) runs on each frame of the video and is independent of the tracker i.e. DeepSORT. The DeepSORT uses the detections from the object detector for every frame and tries to associate it with the detections in the previous frame.

It is during this association when the DeepSORT's feature extractor is used in addition to the Hungarian algorithm to provide best association and tracking results.

You can find a detailed explanation on: