Search code examples
pythonpytorchobject-detectionyoloyolov7

How to extract and visualize feature value for an arbitrary layer during inference with YOLOv7?


In my case, I would like to extract and visualize the features output in layers 102, 103, 104 in the following code in cfg/training/yolov7.yaml.

# yolov7 head
head:
  [[-1, 1, SPPCSPC, [512]], # 51
  
   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [37, 1, Conv, [256, 1, 1]], # route backbone P4
   [[-1, -2], 1, Concat, [1]],
   
   [-1, 1, Conv, [256, 1, 1]],
   [-2, 1, Conv, [256, 1, 1]],
   [-1, 1, Conv, [128, 3, 1]],
   [-1, 1, Conv, [128, 3, 1]],
   [-1, 1, Conv, [128, 3, 1]],
   [-1, 1, Conv, [128, 3, 1]],
   [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
   [-1, 1, Conv, [256, 1, 1]], # 63
   
   [-1, 1, Conv, [128, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [24, 1, Conv, [128, 1, 1]], # route backbone P3
   [[-1, -2], 1, Concat, [1]],
   
   [-1, 1, Conv, [128, 1, 1]],
   [-2, 1, Conv, [128, 1, 1]],
   [-1, 1, Conv, [64, 3, 1]],
   [-1, 1, Conv, [64, 3, 1]],
   [-1, 1, Conv, [64, 3, 1]],
   [-1, 1, Conv, [64, 3, 1]],
   [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
   [-1, 1, Conv, [128, 1, 1]], # 75
      
   [-1, 1, MP, []],
   [-1, 1, Conv, [128, 1, 1]],
   [-3, 1, Conv, [128, 1, 1]],
   [-1, 1, Conv, [128, 3, 2]],
   [[-1, -3, 63], 1, Concat, [1]],
   
   [-1, 1, Conv, [256, 1, 1]],
   [-2, 1, Conv, [256, 1, 1]],
   [-1, 1, Conv, [128, 3, 1]],
   [-1, 1, Conv, [128, 3, 1]],
   [-1, 1, Conv, [128, 3, 1]],
   [-1, 1, Conv, [128, 3, 1]],
   [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
   [-1, 1, Conv, [256, 1, 1]], # 88
      
   [-1, 1, MP, []],
   [-1, 1, Conv, [256, 1, 1]],
   [-3, 1, Conv, [256, 1, 1]],
   [-1, 1, Conv, [256, 3, 2]],
   [[-1, -3, 51], 1, Concat, [1]],
   
   [-1, 1, Conv, [512, 1, 1]],
   [-2, 1, Conv, [512, 1, 1]],
   [-1, 1, Conv, [256, 3, 1]],
   [-1, 1, Conv, [256, 3, 1]],
   [-1, 1, Conv, [256, 3, 1]],
   [-1, 1, Conv, [256, 3, 1]],
   [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
   [-1, 1, Conv, [512, 1, 1]], # 101
   
   [75, 1, RepConv, [256, 3, 1]],   #extract
   [88, 1, RepConv, [512, 3, 1]],   #extract
   [101, 1, RepConv, [1024, 3, 1]], #extract

   [[102,103,104], 1, IDetect, [nc, anchors]],   # Detect(P3, P4, P5)
  ]

Also, the following is the result of printing out the model.

Model(
  (model): Sequential(
    (0): Conv(
      (conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (act): SiLU(inplace=True)
    )
    (1): Conv(
      (conv): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (act): SiLU(inplace=True)
    )
    (2): Conv(
      (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (act): SiLU(inplace=True)
    )
----------------------------------------------------
    (102): RepConv(
      (act): SiLU(inplace=True)
      (rbr_reparam): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) # extract
    )
    (103): RepConv(
      (act): SiLU(inplace=True)
      (rbr_reparam): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) # extract
    )
    (104): RepConv(
      (act): SiLU(inplace=True)
      (rbr_reparam): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) # extract
    )
    (105): IDetect(
      (m): ModuleList(
        (0): Conv2d(256, 21, kernel_size=(1, 1), stride=(1, 1))
        (1): Conv2d(512, 21, kernel_size=(1, 1), stride=(1, 1))
        (2): Conv2d(1024, 21, kernel_size=(1, 1), stride=(1, 1))
      )
      (ia): ModuleList(
        (0): ImplicitA()
        (1): ImplicitA()
        (2): ImplicitA()
      )
      (im): ModuleList(
        (0): ImplicitM()
        (1): ImplicitM()
        (2): ImplicitM()
      )
    )
  )
)

However, I would like to be able to take out features of any layer if possible, as I may need features of layers other than this one.

How can I do this?

I tried to do the extraction and visualization from the Model class in models/yolo.py with reference to https://github.com/ultralytics/yolov5/issues/3089, but could not figure out which code to edit and how. I tried to do the same with the IDetect class, but could not figure it out either.


Solution

  • Thanks to @DerekG for helping me figure this out! The following is the code in yolov7/detect.py after the resolution. The ----- line indicates the omission of a code.

    -------------------------------------------------------------
    from utils.plots import plot_one_box, plot_ts_feature_maps # Add plot_ts_feature_maps method
    -------------------------------------------------------------
    def detect(save_img=False):
    -------------------------------------------------------------
        # Load model
        model = attempt_load(weights, map_location=device)  # load FP32 model
        ---------------------------------------------------------------------
        # Set Dataloader
        vid_path, vid_writer = None, None
        if webcam:
            view_img = check_imshow()
            cudnn.benchmark = True  # set True to speed up constant image size inference
            dataset = LoadStreams(source, img_size=imgsz, stride=stride)
        else:
            dataset = LoadImages(source, img_size=imgsz, stride=stride)
        --------------------------------------------------------------------------
        for path, img, im0s, vid_cap in dataset:
            img = torch.from_numpy(img).to(device)
            img = img.half() if half else img.float()  # uint8 to fp16/32
            img /= 255.0  # 0 - 255 to 0.0 - 1.0
            if img.ndimension() == 3:
                img = img.unsqueeze(0)
            ------------------------------------------------------------------
            # Start of postscript
    
            def make_hook(key):
                def hook(model, input, output):
                    intermediate_output[key] = output.detach()
                return hook
    
            layer_num = 104 # Intermediate layer number
            intermediate_output = {}
            model.model[layer_num].register_forward_hook(make_hook(layer_num))
    
            # forward pass
            model(img)
    
            # print feature map shape
            feature_maps = intermediate_output[layer_num]
            print(feature_maps.shape)
    
            # Outputs a feature map of the intermediate layer
            plot_ts_feature_maps(feature_maps)
    
            # End of postscript
    
            t2 = time_synchronized()
            ------------------------------------------------------------------
    

    Also, yolov7/utils/plots.py was added as follows. Torchshow is a module to visualize Tensor. Here is the official GitHub: https://github.com/xwying/torchshow

    -------------------------------------------------------------------------
    # Add module
    import torchshow as ts
    -------------------------------------------------------------------------
    # Add plot_ts_feature_maps method at the bottom
    def plot_ts_feature_maps(feature_maps):
        import matplotlib
        matplotlib.use('TkAgg')
        feature_maps = feature_maps.to(torch.float32)
        ts.show(feature_maps[0])
    

    As a test, to extract 4 feature maps for the second layer, I changed layer_num = 1 in detect.py and ts.show(feature_maps[0][:4]) in plots.py and ran the following command.

    python detect.py --weights yolov7.pt --source inference/images/horses.jpg --device 0 --no-trace
    

    The inference results and feature maps were then output as follows. inference results feature map