Search code examples
giza++

GIZA++ :Forbidden zero sentence length 0


I have been using GIZA++ for translation of sentence when I used on test dataset an error is displayed "ERROR: Forbidden zero sentence length 0". IS there any way to avoid this error


Solution

  • I had the same problem with the en-vi corpus. (English-Vietnamese) Because your corpus data is too long or not clean.

    You should clean up your corpus data.

    It will limit sentence length to 80. This is the command with Moses tools.

    ~/mosesdecoder/scripts/training/clean-corpus-n.perl 
    ~/corpus/train en vi 
    ~/corpus/train.clean 1 80
    

    Or you can adjust manually.

    Try to cut down the length of each line less than 100 characters or 80 words.