Search code examples
python-3.xopennmt

Trouble with OpenNMT Toy Example (Python3.9)


I recently installed OpenNMT but getting the following error when going through the toy example.

I have macOS Big Sur 11.2.1 I have python2.7 and python3.9 installed.

pip install --upgrade OpenNMT-py==2.0.0rc1

wget https://s3.amazonaws.com/opennmt-trainingdata/toy-ende.tar.gz

tar xf toy-ende.tar.gz

cd toy_ende

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.9/bin/onmt_build_vocab", line 8, in <module>
    sys.exit(main())
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/onmt/bin/build_vocab.py", line 63, in main
    build_vocab_main(opts)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/onmt/bin/build_vocab.py", line 23, in build_vocab_main
    ArgumentParser.validate_prepare_opts(opts, build_vocab_only=True)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/onmt/utils/parse.py", line 127, in validate_prepare_opts
    cls._validate_data(opt)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/onmt/utils/parse.py", line 42, in _validate_data
    cls._validate_file(path_src, info=f'{cname}/path_src')
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/onmt/utils/parse.py", line 18, in _validate_file
    raise IOError(f"Please check path of your {info} file!")
OSError: Please check path of your corpus_1/path_src file!

Solution

  • You can follow this procedure:

    pip install --upgrade OpenNMT-py==2.0.0rc1;
    wget https://s3.amazonaws.com/opennmt-trainingdata/toy-ende.tar.gz;
    tar xf toy-ende.tar.gz;
    echo '## Where the samples will be written
    save_data: toy-ende/run/example
    ## Where the vocab(s) will be written
    src_vocab: toy-ende/run/example.vocab.src
    tgt_vocab: toy-ende/run/example.vocab.tgt
    # Prevent overwriting existing files in the folder
    overwrite: False
    
    # Corpus opts:
    data:
        corpus_1:
            path_src: toy-ende/src-train.txt
            path_tgt: toy-ende/tgt-train.txt
        valid:
            path_src: toy-ende/src-val.txt
            path_tgt: toy-ende/tgt-val.txt
    ' > toy_en_de.yaml;
    mkdir toy-ende/run;
    touch toy-ende/run/example.vocab.src;
    touch toy-ende/run/example.vocab.tgt;
    onmt_build_vocab -config toy_en_de.yaml -n_sample 10000;
    
    echo '# Vocabulary files that were just created
    src_vocab: toy-ende/run/example.vocab.src
    tgt_vocab: toy-ende/run/example.vocab.tgt
    
    # Train on a single GPU
    world_size: 1
    gpu_ranks: [0]
    
    # Where to save the checkpoints
    save_model: toy-ende/run/model
    save_checkpoint_steps: 500
    train_steps: 1000
    valid_steps: 500
    ' >> toy_en_de.yaml;
    onmt_train -config toy_en_de.yaml;
    

    If you do not have a GPU, you need to remove or comment the following lines:

    # Train on a single GPU
    world_size: 1
    gpu_ranks: [0]