Search code examples
tensorflowneural-networkartificial-intelligencepytorch

what is the biggest bottleneck in maskrcnn_benchmark repo?


I am working on a repo that make use of the maskrcnn_benchmark repo. I have extensively, explored the bench-marking repo extensively for the cause of its slower performance on a cpu with respect to enter link description here.

In order to create a benchmark for the individual forward passes I have put a time counter for each part and it gives me the time required to calculate each component. I have had a tough time exactly pinpointing as to the slowest component of the entire architecture.I believe it to be BottleneckWithFixedBatchNorm class in the maskrcnn_benchmark/modeling/backbone/resnet.py file.

I will really appreciate any help in localisation of the biggest bottle neck in this architecture.


Solution

  • I have faced the same problem, the best possible solution for the same is to look inside the main code, go through the forward pass of each module and have a timer setup to log the time that is spent in the computations of each module. How we worked in it was to create an architecture where we create the time logger for each class, therefore every instance of the class will now be logging its time of execution, after through comparison, atleast in our case we have found that the reason for the delay was the depth of the Resnet module, (which given the computational cost of resnet is not a surprising factor at all, the only solution to the same is more palatalization so either ensure a bigger GPU for performing the task or reduce the depth of the Resnet network ).

    I must inform that the maskrcnn_benchmark has been deprecated and an updated version of the same is available in the form of detectron2. Consider moving your code for significant speed improvements in the architecture.

    BottleneckWithFixedBatchNorm is not the most expensive operation in the architecture and certainly not creating the bottleneck as all the operations instead of the name. The class isn't as computationally expensive and is computed in parallel even on a lower end CPU machine (at least in the inference stage).

    An example of tracking better the performance of each module can be found with the code taken from the path : maskrcnn_benchmark/modeling/backbone/resnet.py

    class ResNet(nn.Module):
        def __init__(self, cfg):
            super(ResNet, self).__init__()
    
            # If we want to use the cfg in forward(), then we should make a copy
            # of it and store it for later use:
            # self.cfg = cfg.clone()
    
            # Translate string names to implementations
            stem_module = _STEM_MODULES[cfg.MODEL.RESNETS.STEM_FUNC]
            stage_specs = _STAGE_SPECS[cfg.MODEL.BACKBONE.CONV_BODY]
            transformation_module = _TRANSFORMATION_MODULES[cfg.MODEL.RESNETS.TRANS_FUNC]
    
            # Construct the stem module
            self.stem = stem_module(cfg)
    
            # Constuct the specified ResNet stages
            num_groups = cfg.MODEL.RESNETS.NUM_GROUPS
            width_per_group = cfg.MODEL.RESNETS.WIDTH_PER_GROUP
            in_channels = cfg.MODEL.RESNETS.STEM_OUT_CHANNELS
            stage2_bottleneck_channels = num_groups * width_per_group
            stage2_out_channels = cfg.MODEL.RESNETS.RES2_OUT_CHANNELS
            self.stages = []
            self.return_features = {}
            for stage_spec in stage_specs:
                name = "layer" + str(stage_spec.index)
                stage2_relative_factor = 2 ** (stage_spec.index - 1)
                bottleneck_channels = stage2_bottleneck_channels * stage2_relative_factor
                out_channels = stage2_out_channels * stage2_relative_factor
                stage_with_dcn = cfg.MODEL.RESNETS.STAGE_WITH_DCN[stage_spec.index -1]
                module = _make_stage(
                    transformation_module,
                    in_channels,
                    bottleneck_channels,
                    out_channels,
                    stage_spec.block_count,
                    num_groups,
                    cfg.MODEL.RESNETS.STRIDE_IN_1X1,
                    first_stride=int(stage_spec.index > 1) + 1,
                    dcn_config={
                        "stage_with_dcn": stage_with_dcn,
                        "with_modulated_dcn": cfg.MODEL.RESNETS.WITH_MODULATED_DCN,
                        "deformable_groups": cfg.MODEL.RESNETS.DEFORMABLE_GROUPS,
                    }
                )
                in_channels = out_channels
                self.add_module(name, module)
                self.stages.append(name)
                self.return_features[name] = stage_spec.return_features
    
            # Optionally freeze (requires_grad=False) parts of the backbone
            self._freeze_backbone(cfg.MODEL.BACKBONE.FREEZE_CONV_BODY_AT)
    
        def _freeze_backbone(self, freeze_at):
            if freeze_at < 0:
                return
            for stage_index in range(freeze_at):
                if stage_index == 0:
                    m = self.stem  # stage 0 is the stem
                else:
                    m = getattr(self, "layer" + str(stage_index))
                for p in m.parameters():
                    p.requires_grad = False
    
        def forward(self, x):
            start_timer=time.time()
            outputs = []
            x = self.stem(x)
            for stage_name in self.stages:
                x = getattr(self, stage_name)(x)
                if self.return_features[stage_name]:
                    outputs.append(x)
            print("ResNet time :: ", time.time()-start_timer,file=open("timelogger.log","a"))
            return outputs
    

    Only change that has to be made is in the forward pass and all the instance created out of this class will inherit the properties and log time (choose to write the same to the file instead of a simple stdout)