Search code examples
machine-learningdeep-learningcomputer-visionmxnet

MxNet has trouble saving all parameters of a network


In my experiment, the MxNet may forget saving some parameters of my network.

I am studying mxnet’s gluoncv package (https://gluon-cv.mxnet.io/index.html). To learn the programming skills from the engineers, I manually generate an SSD with ‘gluoncv.model_zoo.ssd.SSD’. The parameters that I use to initialize this class are the same as the official ‘ssd_512_resnet50_v1_voc’ network exceptclasses=('car', 'pedestrian', 'truck', 'trafficLight', 'biker')’.

from gluoncv.model_zoo.ssd import SSD
import mxnet as mx
name = 'resnet50_v1'
base_size = 512
features=['stage3_activation5', 'stage4_activation2']
filters=[512, 512, 256, 256]
sizes=[51.2, 102.4, 189.4, 276.4, 363.52, 450.6, 492]
ratios=[[1, 2, 0.5]] + [[1, 2, 0.5, 3, 1.0/3]] * 3 + [[1, 2, 0.5]] * 2
steps=[16, 32, 64, 128, 256, 512]
classes=('car', 'pedestrian', 'truck', 'trafficLight', 'biker')

pretrained=True

net = SSD(network = name, base_size = base_size, features = features, 
          num_filters = filters, sizes = sizes, ratios = ratios, steps = steps,
              pretrained=pretrained, classes=classes)

I try to feed a manmade data x to this network, and it gives following errors.

x = mx.nd.zeros(shape=(batch_size,3,base_size,base_size))
cls_preds, box_preds, anchors = net(x)

RuntimeError: Parameter 'ssd0_expand_trans_conv0_weight' has not been initialized. Note that you should initialize parameters and create Trainer with Block.collect_params() instead of Block.params because the later does not include Parameters of nested child Blocks

This is reasonable. The SSD uses function ‘gluoncv.nn.feature.FeatureExpander’ to add new layers on the '_resnet50_v1_', and I forget to initialize them. So, I use following codes.

net.initialize()

Oho, it gives me a lot of warnings.

  v.initialize(None, ctx, init, force_reinit=force_reinit)
C:\Users\Bird\AppData\Local\conda\conda\envs\ssd\lib\site-packages\mxnet\gluon\parameter.py:687: UserWarning: Parameter 'ssd0_resnetv10_stage4_batchnorm9_running_mean' is already initialized, ignoring. Set force_reinit=True to re-initialize.
  v.initialize(None, ctx, init, force_reinit=force_reinit)
C:\Users\Bird\AppData\Local\conda\conda\envs\ssd\lib\site-packages\mxnet\gluon\parameter.py:687: UserWarning: Parameter 'ssd0_resnetv10_stage4_batchnorm9_running_var' is already initialized, ignoring. Set force_reinit=True to re-initialize.
  v.initialize(None, ctx, init, force_reinit=force_reinit)

The '_resnet50_v1_' which is the base of SSD are pre-trained, so these parameters cannot be installed. However, these warnings are annoying.

How can I turn them off?

Here, though, comes the first problem. I would like to save the parameters of the network.

net.save_params('F:/Temps/Models_tmp/' +'myssd.params')

The parameter file of _'resnet50_v1_' (‘resnet50_v1-c940b1a0.params’) is 97.7MB; however, my parameter file is only 9.96MB. Are there some magical technologies to compress these parameters?

To test this new technology, I open a new console and rebuild the same network. Then, I load the saved parameters and feed a data to it.

net.load_params('F:/Temps/Models_tmp/' +'myssd.params')
x = mx.nd.zeros(shape=(batch_size,3,base_size,base_size)) 

The initialization error comes again.

RuntimeError: Parameter 'ssd0_expand_trans_conv0_weight' has not been initialized. Note that you should initialize parameters and create Trainer with Block.collect_params() instead of Block.params because the later does not include Parameters of nested child Blocks

This cannot be right because the saved file 'myssd.params' should contain all the installed parameters of my network.

To find the block ‘_ssd0_expand_trans_conv0’, I do a deeper research in ‘gluoncv.nn.feature. FeatureExpander_’. I use ‘mxnet.gluon. nn.Conv2D’ to replace ‘mx.sym.Convolution’ in the ‘FeatureExpander’ function.

'''
        y = mx.sym.Convolution(
            y, num_filter=num_trans, kernel=(1, 1), no_bias=use_bn,
            name='expand_trans_conv{}'.format(i), attr={'__init__': weight_init})
        '''
        Conv1 = nn.Conv2D(channels = num_trans,kernel_size = (1, 1),use_bias = use_bn,weight_initializer = weight_init)
        y = Conv1(y)
        Conv1.initialize(verbose = True)
    '''    
    y = mx.sym.Convolution(
        y, num_filter=f, kernel=(3, 3), pad=(1, 1), stride=(2, 2),
        no_bias=use_bn, name='expand_conv{}'.format(i), attr={'__init__': weight_init})
    '''
    Conv2 = nn.Conv2D(channels = f,kernel_size = (3, 3),padding = (1, 1),strides = (2, 2),use_bias = use_bn, weight_initializer = weight_init)
    y = Conv2(y)
    Conv2.initialize(verbose = True)

These new blocks can be initialized manually. However, the MxNet still report the same errors. It seems that the manual initialization is of no effect.

How can I save all the parameters of my network and restore them?


Solution

  • There is a tutorial on the subject of saving and loading that may be of help: https://mxnet.apache.org/versions/1.6/api/python/docs/tutorials/packages/gluon/blocks/save_load_params.html