Search code examples
google-cloud-platformocrcloud-document-ai

How to recognize characters in International Phonetic Alphabet when OCR


When doing the OCR of a dictionary pdf using DocumentAI, some IPA characters are often included, i.e. ʷ ə etc. Is there a way to recognize them correctly, such as setting a certain language hint? Currently ʷ is recognized as w and ə as a. enter image description here


Solution

  • Document AI only detects IPA characters that are in a supported language.

    However, this could be a useful feature, so I made a Public Issue Tracker for this feature request. https://issuetracker.google.com/287464641