Search code examples
pythontensorflowtensor2tensor

How to get tensor from multiple models and average them?


I am trying to average tensor of two model with identical structure but trained with different datasets. The model are stored in ckpt file.

I tried to look at avg_checkpoints function from tensor2tensor but have no idea how to use it.

How do I solve the problem?

from tensor2tensor.utils import avg_checkpoints

print(avg_checkpoints.checkpoint_exists("/"))
#I got true from console
#I have copied final ckpt from different model to the root file

avg_checkpoint.main(?)
#no idea what to replace the ? with

Solution

  • avg_checkpoints.py is an executable script, so you can use it from the command line, e.g.:

    python utils/avg_checkpoints.py
      --checkpoints path/to/checkpoint1,path/to/checkpoint2
      --num_last_checkpoints 2
      --output_path where/to/save/the/output
    

    Note that if the two checkpoints were trained on different datasets from scratch, the averaging would not work. If you had a single pre-trained model which you just fine-tuned on two different datasets, then the averaging could work.

    You can average more than two checkpoints. A hacky, but simple way to add weights for each checkpoint is to include it multiple times in --checkpoints (and increase num_last_checkpoints accordingly).