Search code examples
training-datamicrosoft-translatormachine-translation

Conflicts in the training data for Microsoft Custom Translator


I am using Microsoft Custom Translator and providing the training data in tmx format. My training data has some conflicts. For example, I have English to German training data where I have duplicate English strings but the German translations are different for these duplicate English strings. In such cases, how does it affect the Model ?


Solution

  • As long as one side is different, they are merely alternative translations, which happen all the time. The alternatives will be kept, and influence the probabilities in the resulting model.