Change anchors to increase IOU for Regional Proposal Network (RPN) using VGG Keras

Introduction...

My goal is to create a Regional Proposal Network (RPN) using VGG as the CNN (I'm open to suggestions of other classifiers to use in the Python Keras framework)

Almost every article I've read says something along the lines of...

Positive anchors are those that have an IoU >= 0.7 with any ground truth object, and negative anchors are those that don't cover any object by more than 0.3 IoU. Anchors in between (i.e. cover an object by IoU >= 0.3 but < 0.7) are considered neutral and excluded from training.

What if my anchor boxes don't give me an IOU of anything greater than 0.27 for an image?

How do I change the anchor boxes (or other parts of my RPN) so that I can have foreground labels?

What I've done so far...

Read an image in and make it ready for prediction by the headless VGG CNN
Make prediction on image and get the output aka the feature map (7,7,512) and map that back to the input image (224,224,3)
Find the coordinates of the anchor points (7x7 49 points) and added an offset of 16 pixels.
Ratio of feature map (7,7) to input image (224,224) is 32, hence I created 9 potential bounding boxes for each anchor point with the scales = [1,2,3] and aspect ratio = [2,1,1/2]. Here is an example of the potential bounding boxes from one anchor point. Note white box is the ground truth, red dots are the anchor points and 9 blue boxes are the anchor boxes.

All potential boxes were looped over and the IOU compared with ground truth. The maximum IOU was calculated to be 0.55 not enough for the 0.7 threshold. Below an image is generated showing all potential boxes that exceed an IOU of 0.25.

As you can see none of the proposed regions do not have a good overlap and so I can't produce any foreground labels for the RPN. I almost need to change the anchor points, create more or shift them? I'm not sure how to do this however.

Some of the code is below

    def get_iou(bb1, bb2):
    """
        Gets the Intersection Over Area, aka how much they cross over
        Assumption 1: Each box is a dictionary with the following
            {"x1":top left top corner x coord,"y1": top left top corner y coord,"x2": bottom right corner x coord,"y2":bottomr right corner y coord}
    """
    assert bb1['x1'] < bb1['x2']
    assert bb1['y1'] < bb1['y2']
    assert bb2['x1'] < bb2['x2']
    assert bb2['y1'] < bb2['y2']
    x_left = max(bb1['x1'], bb2['x1'])
    y_top = max(bb1['y1'], bb2['y1'])
    x_right = min(bb1['x2'], bb2['x2'])
    y_bottom = min(bb1['y2'], bb2['y2'])
    if x_right < x_left or y_bottom < y_top:
        return 0.0
    intersection_area = (x_right - x_left) * (y_bottom - y_top)
    bb1_area = (bb1['x2'] - bb1['x1']) * (bb1['y2'] - bb1['y1'])
    bb2_area = (bb2['x2'] - bb2['x1']) * (bb2['y2'] - bb2['y1'])
    iou = intersection_area / float(bb1_area + bb2_area - intersection_area)
    assert iou >= 0.0
    assert iou <= 1.0
    return iou

    # Extract features
    prediction_ready_img = pre_process_image_for_vgg(img)
    feature_extractor_list = vggmodel.predict(prediction_ready_img)

    # Get shapes of input image and features
    input_image_shape = prediction_ready_img[0].shape
    img_height, img_width, _ = input_image_shape
    features_height, features_width, _ = feature_extractor_list[0].shape

    # Find mapping from features map (output of vggmodel.predict) back to the input image
    feature_to_input_x = img_width / features_width
    feature_to_input_y = img_height / features_height
    
    x_offset = feature_to_input_x/2
    y_offset = feature_to_input_y/2

    # For the feature map (x,y) determine input image (x,y) as array 
    feature_to_input_coords_x  = [int(x_feature*feature_to_input_x+x_offset) for x_feature in range(features_width)]
    feature_to_input_coords_y  = [int(y_feature*feature_to_input_y+y_offset) for y_feature in range(features_height)]
    coordinate_of_anchor_boxes = [{"x":x,"y":y} for x in feature_to_input_coords_x for y in feature_to_input_coords_y]

    boxes_width_height = generate_potential_box_dimensions(config["AnchorBox"],feature_to_input_x,feature_to_input_y)
    list_of_potential_boxes_for_coords = [generate_potential_boxes_for_coord(boxes_width_height,coord) for coord in coordinate_of_anchor_boxes]
    potential_boxes = [box for boxes_for_coord in list_of_potential_boxes_for_coords for box in boxes_for_coord]
    potential_boxes_in_img = [box for box in potential_boxes if is_box_in_image_bounds(input_image_shape,box)]

    max_iou = max([get_iou(scaled_ground_truth_box,box) for box in potential_boxes_in_img])
    iou_thresholds = [v/100 for v in range(100) if v%5 == 0 and v/100 < max_iou ]
    for iou_threshold in iou_thresholds:
        interested_boxes = [box for box in potential_boxes_in_img if get_iou(scaled_ground_truth_box,box) > iou_threshold]
        print(f"IOU={iou_threshold} num boxes ={len(interested_boxes)} iou = {[ get_iou(scaled_ground_truth_box,box) for box in potential_boxes_in_img if get_iou(scaled_ground_truth_box,box) >iou_threshold]}")
        display_overlayed_feature_map_and_all_potential_boxes(img,coordinate_of_anchor_boxes,interested_boxes,ground_truth=ground_truth_box,wait_time_ms=1000)

Research so far...

[Step by step explanation of RPN + extra] - https://dongjk.github.io/code/object+detection/keras/2018/05/21/Faster_R-CNN_step_by_step,_Part_I.html
[vgg with top=false will only output the feature maps which is (7,7,512), other solutions will have different features produced] - https://github.com/keras-team/keras/issues/4465
[Understanding anchor boxes] - https://machinelearningmastery.com/padding-and-stride-for-convolutional-neural-networks/
[Faster RCNN - how they calculate stride] - https://stats.stackexchange.com/questions/314823/how-is-the-stride-calculated-in-the-faster-rcnn-paper
[Good article on Faster RCNN explained] - https://medium.com/@smallfishbigsea/faster-r-cnn-explained-864d4fb7e3f8
[Indicating that Anchor boxes should be determine by ratio and scale ratio should be width:height of 1:2 1:1 2:1 scale should be 1 1/2 1/3] - https://keras.io/examples/vision/retinanet/
[Best explanation of anchor boxes] - https://www.mathworks.com/help/vision/ug/anchor-boxes-for-object-detection.html#:~:text=Anchor%20boxes%20are%20a%20set,sizes%20in%20your%20training%20datasets
[Summary of object detection history, interesting read] - https://dudeperf3ct.github.io/object/detection/2019/01/07/Mystery-of-Object-Detection/
[Mask RCNN Jupyter Notebook] - https://github.com/matterport/Mask_RCNN/blob/master/samples/coco/inspect_model.ipynb
[RPN in Python Keras which i'm trying to understand] - https://github.com/dongjk/faster_rcnn_keras/blob/master/RPN.py
[RPN implementation Keras Python] - https://github.com/you359/Keras-FasterRCNN/blob/master/keras_frcnn/data_generators.py

wow you made it all the way to the bottom, hope you had a good read!

Solution

Turns out the answer is just get the next best proposed region, select the potential box (anchor box) that has the highest IOU with ground truth.

Snippet below from - https://github.com/dongjk/faster_rcnn_keras/blob/4c01554ba2bf494badd50e9e22a7e7e65046f5b8/RPN.py#L106

# We assign a positive label to two kinds of anchors: (i) the
# anchor/anchors with the highest Intersection-overUnion
# (IoU) overlap with a ground-truth box, or (ii) an
# anchor that has an IoU overlap higher than 0.7 with any gt boxes

Implementation Found Here - https://github.com/you359/Keras-FasterRCNN/blob/eb67ad5d946581344f614faa1e3ee7902f429ce3/keras_frcnn/data_generators.py#L203
My implementation is below

    def get_foreground_and_background_labels(scaled_ground_truth_box,potential_boxes_in_img,label_set_size=256,background_iou_thresh=0.0,foreground_iou_thresh=0.7):
    """
        Gets a set of labelled foreground and background boxes
        First, loops through all the potential boxes in the image and checks if the IOU is greater or equal to the threshold
            it labels these as foreground
        Second, checks to see if there were any potential boxes that had an IOU greater or equal to the threshold. 
            If none were detected, it finds the next best option with max IOU
            If there are still no foreground labels an error is raised
        Third, adds background labels which have an IOU of less than or equal to background_iou_thresh up to the label_set_size
            Assumption 1: background_iou_thresh will be a float between 0 - 1
                          background regions will be those will an IOU less than background_iou_thresh
            Assumption 2: foreground_iou_thresh will be a float between 0 - 1
                          foreground regions will be those will an IOU more than foreground_iou_thresh
            Assumption 3: If there are no proposed regions with an IOU with the ground truth above foreground_iou_thresh
                          then the next best option is taken as long as the IOU is above 0. 
            Assumption 4: There is only one object per image, aka only 1 ground truth box per image
    """
    # Make computation faster by using list comprehension
    iou_box_with_gtruth = [get_iou(scaled_ground_truth_box,box) for box in potential_boxes_in_img]

    # Generate foreground aka object labels from thresholds, stop after label_set_size/2
    foreground_box_labels = []
    for index, potential_box in enumerate(potential_boxes_in_img):
        if iou_box_with_gtruth[index] >= foreground_iou_thresh:
            foreground_box_labels.append(potential_box)
        if len(foreground_box_labels) > label_set_size/2:
            break
    
    # If no potential above IOU threshold then pick the next best thing
    # This was likely to happen in my dataset
    if len(foreground_box_labels) == 0:
        index_of_box_with_max_iou = [index for index, iou in enumerate(iou_box_with_gtruth) if iou == max(iou_box_with_gtruth)][0]
        assert index_of_box_with_max_iou != 0 # Raise error if this happens
        best_potential_box = potential_boxes_in_img[index_of_box_with_max_iou]
        foreground_box_labels.append(best_potential_box)

    # Generate background aka not object labels from thresholds
    background_box_labels = []
    for index, potential_box in enumerate(potential_boxes_in_img):
        if iou_box_with_gtruth[index] <= background_iou_thresh:
            background_box_labels.append(potential_box)
        if len(background_box_labels) + len(foreground_box_labels) >= label_set_size:
            break

    return foreground_box_labels, background_box_labels