I uploaded a wav and text file to the custom speech portal. I got the following error: “Error: normalized text is empty.” The text file is UTF-8 BOM, and is similar in format to a file that did work. How I can trouble-shoot this?
There can be several reasons for a normalized text to be empty, e.g. if there are words of Latin and non-Latin characters in a sentence (depending on the locale). Also, words that are repeated multiple times in a row may cause this. Can you share which locale you're using to import the data? If you could share the text we can find the reason. Otherwise you could try to reduce the input text (no need to cut the audio for this) to find out what causes the normalization to discard the sentence.