python pytorch object-detection yolo yolov7

How to extract and visualize feature value for an arbitrary layer during inference with YOLOv7?

In my case, I would like to extract and visualize the features output in layers 102, 103, 104 in the following code in cfg/training/yolov7.yaml.

# yolov7 head
head:
  [[-1, 1, SPPCSPC, [512]], # 51
  
   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [37, 1, Conv, [256, 1, 1]], # route backbone P4
   [[-1, -2], 1, Concat, [1]],
   
   [-1, 1, Conv, [256, 1, 1]],
   [-2, 1, Conv, [256, 1, 1]],
   [-1, 1, Conv, [128, 3, 1]],
   [-1, 1, Conv, [128, 3, 1]],
   [-1, 1, Conv, [128, 3, 1]],
   [-1, 1, Conv, [128, 3, 1]],
   [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
   [-1, 1, Conv, [256, 1, 1]], # 63
   
   [-1, 1, Conv, [128, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [24, 1, Conv, [128, 1, 1]], # route backbone P3
   [[-1, -2], 1, Concat, [1]],
   
   [-1, 1, Conv, [128, 1, 1]],
   [-2, 1, Conv, [128, 1, 1]],
   [-1, 1, Conv, [64, 3, 1]],
   [-1, 1, Conv, [64, 3, 1]],
   [-1, 1, Conv, [64, 3, 1]],
   [-1, 1, Conv, [64, 3, 1]],
   [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
   [-1, 1, Conv, [128, 1, 1]], # 75
      
   [-1, 1, MP, []],
   [-1, 1, Conv, [128, 1, 1]],
   [-3, 1, Conv, [128, 1, 1]],
   [-1, 1, Conv, [128, 3, 2]],
   [[-1, -3, 63], 1, Concat, [1]],
   
   [-1, 1, Conv, [256, 1, 1]],
   [-2, 1, Conv, [256, 1, 1]],
   [-1, 1, Conv, [128, 3, 1]],
   [-1, 1, Conv, [128, 3, 1]],
   [-1, 1, Conv, [128, 3, 1]],
   [-1, 1, Conv, [128, 3, 1]],
   [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
   [-1, 1, Conv, [256, 1, 1]], # 88
      
   [-1, 1, MP, []],
   [-1, 1, Conv, [256, 1, 1]],
   [-3, 1, Conv, [256, 1, 1]],
   [-1, 1, Conv, [256, 3, 2]],
   [[-1, -3, 51], 1, Concat, [1]],
   
   [-1, 1, Conv, [512, 1, 1]],
   [-2, 1, Conv, [512, 1, 1]],
   [-1, 1, Conv, [256, 3, 1]],
   [-1, 1, Conv, [256, 3, 1]],
   [-1, 1, Conv, [256, 3, 1]],
   [-1, 1, Conv, [256, 3, 1]],
   [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
   [-1, 1, Conv, [512, 1, 1]], # 101
   
   [75, 1, RepConv, [256, 3, 1]],   #extract
   [88, 1, RepConv, [512, 3, 1]],   #extract
   [101, 1, RepConv, [1024, 3, 1]], #extract

   [[102,103,104], 1, IDetect, [nc, anchors]],   # Detect(P3, P4, P5)
  ]

Also, the following is the result of printing out the model.

Model(
  (model): Sequential(
    (0): Conv(
      (conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (act): SiLU(inplace=True)
    )
    (1): Conv(
      (conv): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (act): SiLU(inplace=True)
    )
    (2): Conv(
      (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (act): SiLU(inplace=True)
    )
----------------------------------------------------
    (102): RepConv(
      (act): SiLU(inplace=True)
      (rbr_reparam): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) # extract
    )
    (103): RepConv(
      (act): SiLU(inplace=True)
      (rbr_reparam): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) # extract
    )
    (104): RepConv(
      (act): SiLU(inplace=True)
      (rbr_reparam): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) # extract
    )
    (105): IDetect(
      (m): ModuleList(
        (0): Conv2d(256, 21, kernel_size=(1, 1), stride=(1, 1))
        (1): Conv2d(512, 21, kernel_size=(1, 1), stride=(1, 1))
        (2): Conv2d(1024, 21, kernel_size=(1, 1), stride=(1, 1))
      )
      (ia): ModuleList(
        (0): ImplicitA()
        (1): ImplicitA()
        (2): ImplicitA()
      )
      (im): ModuleList(
        (0): ImplicitM()
        (1): ImplicitM()
        (2): ImplicitM()
      )
    )
  )
)

However, I would like to be able to take out features of any layer if possible, as I may need features of layers other than this one.

How can I do this?

I tried to do the extraction and visualization from the Model class in models/yolo.py with reference to https://github.com/ultralytics/yolov5/issues/3089, but could not figure out which code to edit and how. I tried to do the same with the IDetect class, but could not figure it out either.

Solution

Thanks to @DerekG for helping me figure this out! The following is the code in yolov7/detect.py after the resolution. The ----- line indicates the omission of a code.

-------------------------------------------------------------
from utils.plots import plot_one_box, plot_ts_feature_maps # Add plot_ts_feature_maps method
-------------------------------------------------------------
def detect(save_img=False):
-------------------------------------------------------------
    # Load model
    model = attempt_load(weights, map_location=device)  # load FP32 model
    ---------------------------------------------------------------------
    # Set Dataloader
    vid_path, vid_writer = None, None
    if webcam:
        view_img = check_imshow()
        cudnn.benchmark = True  # set True to speed up constant image size inference
        dataset = LoadStreams(source, img_size=imgsz, stride=stride)
    else:
        dataset = LoadImages(source, img_size=imgsz, stride=stride)
    --------------------------------------------------------------------------
    for path, img, im0s, vid_cap in dataset:
        img = torch.from_numpy(img).to(device)
        img = img.half() if half else img.float()  # uint8 to fp16/32
        img /= 255.0  # 0 - 255 to 0.0 - 1.0
        if img.ndimension() == 3:
            img = img.unsqueeze(0)
        ------------------------------------------------------------------
        # Start of postscript

        def make_hook(key):
            def hook(model, input, output):
                intermediate_output[key] = output.detach()
            return hook

        layer_num = 104　# Intermediate layer number
        intermediate_output = {}
        model.model[layer_num].register_forward_hook(make_hook(layer_num))

        # forward pass
        model(img)

        # print feature map shape
        feature_maps = intermediate_output[layer_num]
        print(feature_maps.shape)

        # Outputs a feature map of the intermediate layer
        plot_ts_feature_maps(feature_maps)

        # End of postscript

        t2 = time_synchronized()
        ------------------------------------------------------------------

Also, yolov7/utils/plots.py was added as follows. Torchshow is a module to visualize Tensor. Here is the official GitHub: https://github.com/xwying/torchshow

-------------------------------------------------------------------------
# Add module
import torchshow as ts
-------------------------------------------------------------------------
# Add plot_ts_feature_maps method at the bottom
def plot_ts_feature_maps(feature_maps):
    import matplotlib
    matplotlib.use('TkAgg')
    feature_maps = feature_maps.to(torch.float32)
    ts.show(feature_maps[0])

As a test, to extract 4 feature maps for the second layer, I changed layer_num = 1 in detect.py and ts.show(feature_maps[0][:4]) in plots.py and ran the following command.

python detect.py --weights yolov7.pt --source inference/images/horses.jpg --device 0 --no-trace

The inference results and feature maps were then output as follows. inference results feature map