google-cloud-platform ocr cloud-document-ai

How to recognize characters in International Phonetic Alphabet when OCR

When doing the OCR of a dictionary pdf using DocumentAI, some IPA characters are often included, i.e. ʷ ə etc. Is there a way to recognize them correctly, such as setting a certain language hint? Currently ʷ is recognized as w and ə as a.

Solution

Document AI only detects IPA characters that are in a supported language.

However, this could be a useful feature, so I made a Public Issue Tracker for this feature request. https://issuetracker.google.com/287464641