Search code examples
google-cloud-vision

OCR using google-cloud-vision - Result does not contain uni characters for Polish, German, etc


I am trying to use OCR feature in Google Vision API but not able to receive expected result. I expect to see ü for German and ć, ń, ó, ś, ź, ł, ę, ą for Polish in the results. Is there a way I can do it?

Obtained text does not contain uni characters for many languages: Polish, German. But this languages in the list of supported languages and language was detected correctly.

enter image description here

I use drag&drop option here https://cloud.google.com/vision/ and CloudVision Android Sample. Thank you for any advices.


Solution

  • I solved this problem. For gitting UNI characters in the result you need to set LanguageHints.

    For Java it will be:

    ImageContext imageContext = new ImageContext();
    List<String> languages = new ArrayList<>();
    languages.add("pl");
    imageContext.setLanguageHints(languages);
    annotateImageRequest.setImageContext(imageContext);
    

    Now I have ć, ń, ó, ś, ź, ł, ę, ą for Polish in the results.