deep-learning computer-vision geometry object-detection bounding-box

calculate distance between camera and different sized objects

I have been trying to develop a small object detection system for my college project. The main idea is that i have a robot , that can pick one particular "object" from the surroundings, for this purpose i am using only a single camera, with known intrinsic parameters. I have already developed an object detection system, which can predict bounding box coordinates, using these coordinates and size of bounding boxes, i am able to predict perceived depth, using "Triangle similarity" method, The problem that i am facing is , this particular "object" can vary in size, which means the objects located at the same distance can also have different sized bounding boxes.

What could be the other way to detect rough estimate from camera to object, given an object doesn't have a fixed size.

Solution

Cannot be done in general, since scale information is lost in camera projection.

Depending on your particular case, you may be able to use more indirect methods to infer distance. For example, if the subject rests on a ground plane, you may be able to exploit knowledge of the shape and size of patterns on that floor. More sophisticated methods were analyzed many years ago - the general subject goes under the heading of "single-view metrology". A good reference is Antonio Criminisi's 1999 PhD thesis.