Search code examples
machine-learningamazon-sagemakersemantic-segmentation

Missing -symbol.json error when trying to compile a SageMaker semantic segmentation model (built-in algorithm) with SageMaker Neo


I have trained a SageMaker semantic segmentation model, using the built-in sagemaker semantic segmentation algorithm. This deploys ok to a SageMaker endpoint and I can run inference in the cloud successfully from it. I would like to use the model on a edge device (AWS Panorama Appliance) which should just mean compiling the model with SageMaker Neo to the specifications of the target device.

However, regardless of what my target device is (the Neo settings), I cant seem to compile the model with Neo as I get the following error:

ClientError: InputConfiguration: No valid Mxnet model file -symbol.json found

The model.tar.gz for semantic segmentation models contains hyperparams.json, model_algo-1, model_best.params. According to the docs, model_algo-1 is the serialized mxnet model. Aren't gluon models supported by Neo?

Incidentally I encountered the exact same problem with another SageMaker built-in algorithm, the k-Nearest Neighbour (k-NN). It too seems to be compiled without a -symbol.json.

Is there some scripts I can run to recreated a -symbol.json file or convert the compiled sagemaker model?

After building my model with an Estimator, I got to compile it in SageMaker Neo with code:

optimized_ic = my_estimator.compile_model(
 target_instance_family="ml_c5",
 target_platform_os="LINUX",
 target_platform_arch="ARM64",
 input_shape={"data": [1,3,512,512]},  
 output_path=s3_optimized_output_location,
 framework="mxnet",
 framework_version="1.8", 
)

I would expect this to compile ok, but that is where I get the error saying the model is missing the *-symbol.json file.


Solution

  • For some reason, AWS has decided to not make its built-in algorithms directly compatible with Neo... However, you can re-engineer the network parameters using the model.tar.gz output file and then compile.

    Step 1: Extract model from tar file

    import tarfile
    #path to local tar file
    model = 'ss_model.tar.gz'
    
    #extract tar file 
    t = tarfile.open(model, 'r:gz')
    t.extractall()
    

    This should output two files: model_algo-1, model_best.params

    1. Load weights into network from gluon model zoo for the architecture that you chose

    In this case I used DeepLabv3 with resnet50

    import gluoncv
    import mxnet as mx
    from gluoncv import model_zoo
    from gluoncv.data.transforms.presets.segmentation import test_transform
    
    model = model_zoo.DeepLabV3(nclass=2, backbone='resnet50', pretrained_base=False, height=800, width=1280, crop_size=240)
    model.load_parameters("model_algo-1")
    
    1. Check the parameters have loaded correctly by making a prediction with new model

    Use an image that was used for training.

    #use cpu
    ctx = mx.cpu(0)
    #decode image bytes of loaded file
    img = image.imdecode(imbytes)
    
    #transform image
    img = test_transform(img, ctx)
    img = img.astype('float32')
    print('tranformed image shape: ', img.shape)
    
    #get prediction
    output = model.predict(img)
    
    1. Hybridise model into output required by Sagemaker Neo

    Additional check for image shape compatibility

    model.hybridize()
    model(mx.nd.ones((1,3,800,1280)))
    export_block('deeplabv3-res50', model, data_shape=(3,800,1280), preprocess=None, layout='CHW')
    
    1. Recompile model into tar.gz format

    This contains the params and json file which Neo looks for.

    tar = tarfile.open("comp_model.tar.gz", "w:gz")
    for name in ["deeplabv3-res50-0000.params", "deeplabv3-res50-symbol.json"]:
        tar.add(name)
    tar.close()
    
    1. Save tar.gz file to s3 and then compile using Neo GUI