Search code examples
tensorflowgoogle-cloud-mlgoogle-cloud-ml-engine

Google Cloud ML Engine pass multiple file paths as arguments


I am trying to run a job on Google Cloud ML Engine and can't seem to pass multiple file paths as arguments to the parser. Here is what I am writing in the terminal:

JOB_NAME=my_job_name
BUCKET_NAME=my_bucket_name
OUTPUT_PATH=gs://$BUCKET_NAME/$JOB_NAME
DATA_PATH=gs://$BUCKET_NAME/my_data_directory
REGION=us-east1

gcloud ml-engine jobs submit training $JOB_NAME \
    --job-dir $OUTPUT_PATH \
    --runtime-version 1.2 \
    --module-name trainer.task \
    --package-path trainer/ \
    --region $REGION \
    -- \
    --file-path "${DATA_PATH}/*" \
    --num-epochs 10 

Where my_data_directory contains multiple files I later want to read, the problem is that --file-path contains only ['gs://my_bucket_name/my_data_directory'] and not a list of files in said directory.

How do I fix this?

Many thanks in advance.


Solution

  • Since the arguments you pass after the -- \ line will be user arguments, how the program handles these arguments will largely depend on the trainer you defined. I would go back and modify the trainer program and make it either treat the directory differently or take multiple paths like this:

    gcloud ml-engine jobs submit training $JOB_NAME \
        --job-dir $OUTPUT_PATH \
        --runtime-version 1.2 \
        --module-name trainer.task \
        --package-path trainer/ \
        --region $REGION \
        --scale-tier STANDARD_1 \
        -- \
        --train-files $TRAIN_DATA \
        --eval-files $EVAL_DATA \
        --train-steps 1000 \
        --verbosity DEBUG  \
        --eval-steps 100
    

    Some links that will be helpful for developing your own trainer: [1] [2]