Search code examples
phpgoogle-speech-api

Japanese text error in Google speech api php


Google Speech api is working fine for me when I use 'languageCode' => 'en-US' with English audio file. But when using 'languageCode' => 'ja-JP' with Japanese audio file, its returning broken text like "Transcription: ã‚‚ã—ã‚‚ã—è² ã‘ホンダã—ã¦ã‚‚ã—ã‚‚ã—"

Sample code from google :

# Includes the autoloader for libraries installed with composer
require __DIR__ . '/vendor/autoload.php';

# Imports the Google Cloud client library
use Google\Cloud\Speech\SpeechClient;

# Your Google Cloud Platform project ID
$projectId = 'YOUR_PROJECT_ID';

# Instantiates a client
$speech = new SpeechClient([
    'projectId' => $projectId,
    'languageCode' => 'en-US',
]);

# The name of the audio file to transcribe
$fileName = __DIR__ . '/resources/audio.raw';

# The audio file's encoding and sample rate
$options = [
    'encoding' => 'LINEAR16',
    'sampleRateHertz' => 16000,
];

# Detects speech in the audio file
$results = $speech->recognize(fopen($fileName, 'r'), $options);

foreach ($results[0]->alternatives() as $alternative) {
    echo 'Transcription: ' . $alternative['transcript'] . PHP_EOL;
}

I've checked the Cloud Speech API Client Libraries and followed the sample from Google.


Solution

  • Google Speech API returning the response in Japanese correctly inside $results. The Default encoding type is UTF-8. Its clearly written in the documentation. Google\Cloud\Language\LanguageClient

    The problem was echo in the foreach which breaks down the Japanese character. In my case I actually don't need to echo rather than use the $results. So now it's working fine for me.

    Perhaps, If someone wants to use echo to show the result, following links can be helpful.

    1. PHP Japanese echo string becomes question marks
    2. How to display Japanese characters on a php page?

    Thanks.