machine-learning nlp machine-translation moses

Documentation of Moses (statistical machine translation) mose.ini file format?

Is there any documentation of the moses.ini format for Moses? Running moses at the command line without arguments returns available feature names but not their available arguments. Additionally, the structure of the .ini file is not specified in the manual that I can see.

Solution

The main idea is that the file contains settings that will be used by the translation model. Thus, the documentation of values and options in moses.ini should be looked up in the Moses feature specifications.

Here are some excerpt I found on the Web about moses.ini.

In the Moses Core, we have some details:

7.6.5 moses.ini
All feature functions are specified in the [feature] section. It should be in the format:
* Feature-name key1=value1 key2=value2 ....
For example, KENLM factor=0 order=3 num-features=1 lazyken=0 path=file.lm.gz

Also, there is a hint on how to print basic statistics about all components mentioned in the moses.ini.

Run the script
analyse_moses_model.pl moses.ini
This can be useful to set the order of mapping steps to avoid explosion of translation options or just to check that the model components are as big/detailed as we expect.

In the Center for Computational Language and EducAtion Research (CLEAR) Wiki, there is a sample file with some documentation:

Parameters

It is recommended to make an .ini file to storage all of your setting.

input-factors
- Using factor model or not
mapping
- To use LM in memory (T) or read the file in hard disk directly (G)
ttable-file
- Indicate the num. of source-factor, num. of target-factor, num of score, and the path to translation table file
lmodel-file
- Indicate the type using for LM (0:SRILM, 1:IRSTLM), using factor number, the order (n-gram) of LM, and the path to language model file

If it is not enough, there is another description on this page, see "Decoder configuration file" section

The sections [ttable-file] and [lmodel-file] contain pointers to the phrase table file and language model file, respectively. You may disregard the numbers on those lines. For the time being, it's enough to know that the last one of the numbers in the language model specification is the order of the n-gram model.

The configuration file also contains some feature weights. Note that the [weight-t] section has 5 weights, one for each feature contained in the phrase table.

The moses.ini file created by the training process will not work with your decoder without modification because it relies on a language model library that is not compiled into our decoder. In order to make it work, open the moses.ini file and find the language model specification in the line immediately after the [lmodel-file] heading. The first number on this line will be 0, which stands for SRILM. Change it into 8 and leave the rest of the line untouched. Then your configuration should work.