I'm getting the following error while using MultiWorkerMirroredStrategy() for training Custom Estimator on Google AI-Platform (CMLE).
ValueError: Unrecognized task_type: 'master', valid task types are: "chief", "worker", "evaluator" and "ps".
Both MirroredStrategy() and PamameterServerStrategy() are working fine on AI-Platform with their respective config.yaml
files. I'm currently not providing device scopes for any operations. Neither I'm providing any device filter in session config, tf.ConfigProto(device_filters=device_filters)
.
The config.yaml
file which I'm using for training with MultiWorkerMirroredStrategy() is:
trainingInput:
scaleTier: CUSTOM
masterType: standard_gpu
workerType: standard_gpu
workerCount: 4
The masterType
input is mandatory for submitting the training job on AI-Platform.
Note: It's showing 'chief' as a valid task type and 'master' as invalid. I'm providing tensorflow-gpu==1.14.0 in setup.py for trainer package.
(1) This appears to be a bug then with MultiWorkerMirroredStrategy. Please file a bug in TensorFlow. In TensorFlow 1.x, it should be using master and in TensorFlow 2.x, it should be using chief. The code is (wrongly) asking for chief, and AI Platform (because you are using 1.14) is providing only master. Incidentally: master = chief + evaluator.
(2) Do not have add tensorflow to your setup.py. Provide the tensorflow framework you want AI Platform to use using the --runtime-version
(See https://cloud.google.com/ml-engine/docs/runtime-version-list) flag to gcloud.