Search code examples
translationbing-apilanguage-translation

Should I retain html markup when performing language translation (ie via MS translate API)?


Is there any advantage in passing html fragments to the translation API as opposed to only plain text. For example, translating the following

Please click <a href='#'>here</a> to continue

returns a valid, translated html fragment - but what happens under the hood? Is the returned translation equivalent to the translation of three sentence fragments

Please click > here > to continue

Or the single sentence

Please click here to continue

Why do I ask? I have one or two html fragments that are larger than the permitted size and I need to chunk them up in some-way. Using the htmlagilitypack I could just replace the html document text nodes with the translated equivalent values but do i lose anything by doing this? Will the quality of the translation improve if I translate whole sentences (ie <H1>, <H2>, <p> tags)

Many thanks in advance

Duncan


Solution

  • From MSDN here I got the following reply:

    In the translation it matters where the sentence boundary lies. HTML markup has sentence-internal elements, like <em> or <a>, and sentence breaking elements, like <p>. HTML mode will process the elements according to their behavior in HTML document rendering.

    So there it is!