Search code examples
tensorflowimagenettensorflow-model-garden

How to prepare imagenet dataset to run resnet50 (from official Tensorflow Model Garden) training


I'd like to train a resnet50 model on imagenet2012 dataset on my local GPU server, following exactly this Tensorflow official page: https://github.com/tensorflow/models/tree/master/official/vision/image_classification#imagenet-preparation However, I don't know how to prepare the imagenet2012 training and validation dataset exactly such that I can start the training like this:

python3 classifier_trainer.py \
  --mode=train_and_eval \
  --model_type=resnet \
  --dataset=imagenet \
  --model_dir=$MODEL_DIR \
  --data_dir=$DATA_DIR ??? \ # ----------> HOW TO CONFIG THIS DIR IF I HAVE DOWNLOADED THE DATA??
  --config_file=configs/examples/resnet/imagenet/gpu.yaml \
  --params_override='runtime.num_gpus=$NUM_GPUS'

Specifically, I have downloaded the dataset as two tar files:ILSVRC2012_img_train.tar,ILSVRC2012_img_val.tar to \myPath directory, following the instruction:https://github.com/tensorflow/datasets/blob/master/docs/catalog/imagenet2012.md#imagenet2012 Could anyone tell me the exact steps to prepare the dataset and setup the configurations (either via command line arguments or setting in configs/examples/resnet/imagenet/gpu.yaml ).

PS1, I notice there are two types of dataset that can be used by the training script: 1) using TFDS 2) using TFRecords. I have created the TFRecords dataset using the shell script on the bottom of the page, but still don't know how to setup the configuration. It seems TFDS is recommended by TF, but I am ok with TFRecords format as long as I can run the training successfully. Currently, I already have training and validation TFRecords files in the following form:

${DATA_DIR}/train/train-00000-of-01024
${DATA_DIR}/train/train-00001-of-01024
 ...
${DATA_DIR}/train/train-01023-of-01024

${DATA_DIR}/validation/validation-00000-of-00128
S{DATA_DIR}/validation/validation-00001-of-00128
 ...
${DATA_DIR}/validation/validation-00127-of-00128

PS2: Hope the TF community can provide a clear step by step guide of preparing imagenet dataset for a beginner like me. It will be appreciated!


Solution

  • Were you able to get the output of for:

    python imagenet_to_gcs.py \
      --raw_data_dir=$IMAGENET_HOME \
      --local_scratch_dir=$IMAGENET_HOME/tf_records \
      --nogcs_upload
    

    in the following format?

    ${DATA_DIR}/train-00000-of-01024
    ${DATA_DIR}/train-00001-of-01024
     ...
    ${DATA_DIR}/train-01023-of-01024
    
    ${DATA_DIR}/validation-00000-of-00128
    S{DATA_DIR}/validation-00001-of-00128
     ...
    ${DATA_DIR}/validation-00127-of-00128
    

    I have read a lot of articles performing the task you wish to accomplish and they have followed similar steps as you did but I could not find what got you stuck. If there is any other information you could provide like the error you are getting or something, maybe I could better understand the issue?