Search code examples
microsoft-translator

Azure Translator Text API: What is the definition of a character?


The pricing of the Translator Text API belonging to the Azure Cognitive Services family is based on characters.

But what is the definition of a character?

Some examples:

  • Do spaces, punctuation and line breaks count as a character?
This is   ,   a     


test.
  • When translating HTML does every character count here including angle brackets, tags, slashes etc.?
<p>This is<br>
&nbsp;&nbsp;&nbsp;a
test.</p>

For the sake of completeness: I suppose only the text that is being sent to the API for translation counts (request characters) and not what comes back (response), right?


Solution

  • This is answered here character counts. All of the above examples count as text. Responses do not count. Copying from there:

    What counts is:

    • Text passed to the Translator Text API in the body of the request

    • Text when using the Translate, Transliterate, and Dictionary Lookup methods

    • Text and Translation when using the Dictionary Examples method

    • All markup: HTML, XML tags, etc. within the text field of the request body. JSON notation used to build the request (for instance "Text:") is not counted.

    • An individual letter

    • Punctuation

    • A space, tab, markup, and any kind of white space character

    • Every code point defined in Unicode

    • A repeated translation, even if you have translated the same text previously