Search code examples
google-cloud-platformgoogle-vision

Why does Google Vision recognize 'und' locale for images containing just numbers in text?


I've provided an image to the Google Cloud Vision OCR API to be annotated. The image just contained a phone number.

Google Cloud Vision said the locale of the text was 'und'. Does this mean undefined? I'm not finding any information in the documentation.


Solution

  • Indeed, 'und' is not within the code for languages in the documentation. And since the image didn't contain but numbers then it wouldn't detect a language.

    But, the documentation also states that Vision API uses BCP-47 identifiers, and 'und' is listed as a Non-specific Language Tag. You can also find the clarification that "The special value 'und' (Undetermined) has a 'Scope' of 'special'". Being special defined as:

    'special' - Indicates a special language code. These are subtags used for identifying linguistic attributes not particularly associated with a concrete language. These include codes for when the language is undetermined or for non-linguistic content.

    Therefore, "the 'und' (Undetermined) primary language subtag identifies linguistic content whose language is not determined".