tensorflow deep-learning object-detection

How to choose training data to find many small instances of an object on a bigger image - Mask RCNN?

Let's say I have such task: finding all windows that are on the image. Can I just use for training many images of a single window to then find many windows on one image (image depicts block of flats)? If Yes, How should I choose the size of each training image? What can be the size of my validation image?

Solution

Technically, you can. But the problem here is that you will expect your algorithm to find one window in the image. You can solve this by cropping the test image (block of flats) to smaller parts and trying to predict window in every one of those cropped images. That said, you'll have to preprocess your images of a single window in many ways - blur them, change the sizes, rotate (skew) them, input some artificial noise... But it definitely won't be perfect. I suggest you use YOLOv2 or Yolov3 or some other object detection algorithms and models and train them on images of flats with lots of windows.