Search code examples

ClientError: train channel is not specified with AWS object_detection_augmented_manifest_training using ground truth images

I have completed a labelling job in AWS ground truth and started working on the notebook template for object detection.

I have 2 manifests which has 293 labeled images for birds in a train and validation set like this:


Below are the parameters I am using for the notebook instance:

training_params = \
    "AlgorithmSpecification": {
        "TrainingImage": training_image, # NB. This is one of the named constants defined in the first cell.
        "TrainingInputMode": "Pipe"
    "RoleArn": role,
    "OutputDataConfig": {
        "S3OutputPath": s3_output_path
    "ResourceConfig": {
        "InstanceCount": 1,   
        "InstanceType": "ml.p3.2xlarge",
        "VolumeSizeInGB": 5
    "TrainingJobName": job_name,
    "HyperParameters": { # NB. These hyperparameters are at the user's discretion and are beyond the scope of this demo.
         "base_network": "resnet-50",
         "use_pretrained_model": "1",
         "num_classes": "1",
         "mini_batch_size": "16",
         "epochs": "5",
         "learning_rate": "0.001",
         "lr_scheduler_step": "3,6",
         "lr_scheduler_factor": "0.1",
         "optimizer": "rmsprop",
         "momentum": "0.9",
         "weight_decay": "0.0005",
         "overlap_threshold": "0.5",
         "nms_threshold": "0.45",
         "image_shape": "300",
         "label_width": "350",
         "num_training_samples": str(num_training_samples)
    "StoppingCondition": {
        "MaxRuntimeInSeconds": 86400
 "InputDataConfig": [
        "ChannelName": "train",
        "DataSource": {
            "S3DataSource": {
                "S3DataType": "AugmentedManifestFile", # NB. Augmented Manifest
                "S3Uri": s3_train_data_path,
                "S3DataDistributionType": "FullyReplicated",
                "AttributeNames": ["source-ref","Bird-Label-Train"] # NB. This must correspond to the JSON field names in your augmented manifest.
        "ContentType": "image/jpeg",
        "RecordWrapperType": "None",
        "CompressionType": "None"
        "ChannelName": "validation",
        "DataSource": {
            "S3DataSource": {
                "S3DataType": "AugmentedManifestFile", # NB. Augmented Manifest
                "S3Uri": s3_validation_data_path,
                "S3DataDistributionType": "FullyReplicated",
                "AttributeNames": ["source-ref","Bird-Label"] # NB. This must correspond to the JSON field names in your augmented manifest.
        "ContentType": "image/jpeg",
        "RecordWrapperType": "None",
        "CompressionType": "None"

I would end up with this being printed after running my ml.p3.2xlarge instance:

InProgress Starting
InProgress Starting
InProgress Starting
InProgress Training
Failed Failed

Followed by this error message: 'ClientError: train channel is not specified.'

Does anyone have any thoughts for how I can get this running with no errors? Any help is much apreciated!

Successful run: Below is the paramaters that were used, along with the Augmented Manifest JSON Objects for a successful run.

training_params = \
    "AlgorithmSpecification": {
        "TrainingImage": training_image, # NB. This is one of the named constants defined in the first cell.
        "TrainingInputMode": "Pipe"
    "RoleArn": role,
    "OutputDataConfig": {
        "S3OutputPath": s3_output_path
    "ResourceConfig": {
        "InstanceCount": 1,   
        "InstanceType": "ml.p3.2xlarge",
        "VolumeSizeInGB": 50
    "TrainingJobName": job_name,
    "HyperParameters": { # NB. These hyperparameters are at the user's discretion and are beyond the scope of this demo.
         "base_network": "resnet-50",
         "use_pretrained_model": "1",
         "num_classes": "3",
         "mini_batch_size": "1",
         "epochs": "5",
         "learning_rate": "0.001",
         "lr_scheduler_step": "3,6",
         "lr_scheduler_factor": "0.1",
         "optimizer": "rmsprop",
         "momentum": "0.9",
         "weight_decay": "0.0005",
         "overlap_threshold": "0.5",
         "nms_threshold": "0.45",
         "image_shape": "300",
         "label_width": "350",
         "num_training_samples": str(num_training_samples)
    "StoppingCondition": {
        "MaxRuntimeInSeconds": 86400
    "InputDataConfig": [
            "ChannelName": "train",
            "DataSource": {
                "S3DataSource": {
                    "S3DataType": "AugmentedManifestFile", # NB. Augmented Manifest
                    "S3Uri": s3_train_data_path,
                    "S3DataDistributionType": "FullyReplicated",
                    "AttributeNames": attribute_names # NB. This must correspond to the JSON field names in your **TRAIN** augmented manifest.
            "ContentType": "application/x-recordio",
            "RecordWrapperType": "RecordIO",
            "CompressionType": "None"
            "ChannelName": "validation",
            "DataSource": {
                "S3DataSource": {
                    "S3DataType": "AugmentedManifestFile", # NB. Augmented Manifest
                    "S3Uri": s3_validation_data_path,
                    "S3DataDistributionType": "FullyReplicated",
                    "AttributeNames": ["source-ref","ValidateBird"] # NB. This must correspond to the JSON field names in your **VALIDATION** augmented manifest.
            "ContentType": "application/x-recordio",
            "RecordWrapperType": "RecordIO",
            "CompressionType": "None"

Training Augmented Manifest File generated during the running of the training job

Line 1

Line 2

Line 3

I then unzip the model.tar file to get the following files:hyperparams.JSON, model_algo_1-0000.params and model_algo_1-symbol

hyperparams.JSON looks like this:

{"label_width": "350", "early_stopping_min_epochs": "10", "epochs": "5", "overlap_threshold": "0.5", "lr_scheduler_factor": "0.1", "_num_kv_servers": "auto", "weight_decay": "0.0005", "mini_batch_size": "1", "use_pretrained_model": "1", "freeze_layer_pattern": "", "lr_scheduler_step": "3,6", "early_stopping": "False", "early_stopping_patience": "5", "momentum": "0.9", "num_training_samples": "11", "optimizer": "rmsprop", "_tuning_objective_metric": "", "early_stopping_tolerance": "0.0", "learning_rate": "0.001", "kv_store": "device", "nms_threshold": "0.45", "num_classes": "1", "base_network": "resnet-50", "nms_topk": "400", "_kvstore": "device", "image_shape": "300"}


  • Thank you again for your help. All of which were valid in helping me get further. Having received a response on the AWS forum pages, I finally got it working.

    I understood that my JSON was slightly different to the augmented manifest training guide. Having gone back to basics, I created another labelling job, but used the 'Bounding Box' type as opposed to the 'Custom - Bounding box template'. My output matched what was expected. This ran with no errors!

    As my purpose was to have multiple labels, I was able to edit the files and mapping of my output manifests, which also worked!



    The original mapping was 0:'Bird' for all images through the labelling job.