I'm using googles text-to-speech api on the backend, and sending to the frontend in the form of an ArrayBuffer. It then gets converted to a url that played with audio.play()
This is working on chrome on mobile, windows, and macOS, but no luck in Safari.
I've seen a few threads similar to this one, and tried a few of the answers with no luck.
I've tried creating the audioPlayer
when the component is created, and just changing the src in playVoice
playVoice
is just called from a button onClick
The frontend functions look like:
const playVoice = (text: string) => {
getSpeech(text, sourceLanguage, "NEUTRAL").then((res) => {
const audioPlayer = new Audio();
audioPlayer.pause();
audioPlayer.currentTime = 0;
audioPlayer.src = convertAudio([res.data]);
audioPlayer.play();
});
};
with getSpeech being an axios get request:
export const getSpeech = async (
text: string,
languageCode: string,
voice: VoiceTypes
) => {
return await axios({
method: "GET",
url: "/api/speech/",
responseType: "blob",
params: {
text,
languageCode,
voice,
},
});
};
and convertAudio looks like
export const convertAudio = (buffer: ArrayBuffer[]): string => {
return URL.createObjectURL(new Blob(buffer));
};
My backend looks something like
const textToSpeech = require("@google-cloud/text-to-speech");
const asyncHandler = require("express-async-handler");
const stream = require("stream");
const client = new textToSpeech.TextToSpeechClient(process.env.SERVICE_ACCOUNT);
const getVoice = asyncHandler(async (req, res) => {
const { text, languageCode, voice } = req.query;
const request = {
input: { text },
voice: { languageCode, ssmlGender: voice },
audioConfig: { audioEncoding: "MP3" },
};
res.set({
"Content-Type": "audio/mpeg",
"Transfer-Encoding": "chunked",
});
const [response] = await client.synthesizeSpeech(request);
const bufferStream = new stream.PassThrough();
bufferStream.end(Buffer.from(response.audioContent));
bufferStream.pipe(res);
});
A few notes about the code you've shown:
HTMLAudioElement
constructor accepts a string
URL parameter, which is described to be used this way:If a URL is specified, the browser begins to asynchronously load the media resource before returning the new object.
This is advantageous because it allows for using a streaming audio resource, from which playback can begin as soon as the browser has determined that enough data has been downloaded that the playback timeline can progress with the continued progressive download without interruption: all without needing to have downloaded the entire audio file in advance.
The code you've shown first downloads the entire audio file before beginning playback, but you can change this and respond to the canplaythrough
event to begin playback at an earlier time by constructing a source URL instead of using axios to download the file.
From the event's documentation page:
The
canplaythrough
event is fired when the user agent can play the media, and estimates that enough data has been loaded to play the media up to its end without having to stop for further buffering of content.
You just need to create a function which will construct the appropriate URL — here's an example:
// You don't show this type, so here's an example:
type VoiceType = 'NEUTRAL';
function createSpeechUrl (
text: string,
languageCode: string,
voice: VoiceType,
): URL {
const url = new URL('/api/speech/', window.location.href);
url.searchParams.set('text', text);
url.searchParams.set('languageCode', languageCode);
url.searchParams.set('voice', voice);
return url;
}
You tagged your question with reacjs, so — even though you don't show any React code — I assume you're using React. Below I've prepared a code snippet demonstrating the technique I described above with a simple button rendered by React. I tested this using Chrome and Safari (the browsers you named in the question text), and everything works as expected in those environments.
<div id="root"></div><script src="https://cdn.jsdelivr.net/npm/[email protected]/umd/react.development.js"></script><script src="https://cdn.jsdelivr.net/npm/[email protected]/umd/react-dom.development.js"></script><script src="https://cdn.jsdelivr.net/npm/@babel/[email protected]/babel.min.js"></script><script>Babel.registerPreset('tsx', {presets: [[Babel.availablePresets['typescript'], {allExtensions: true, isTSX: true}]]});</script>
<style>button { font-family: sans-serif; font-size: 1rem; padding: 0.5rem; }</style>
<script type="text/babel" data-type="module" data-presets="tsx,react">
// You don't show this type, so here's an example:
type VoiceType = 'NEUTRAL';
function createSpeechUrl (
text: string,
languageCode: string,
voice: VoiceType,
): URL {
const url = new URL('/api/speech/', window.location.href);
url.searchParams.set('text', text);
url.searchParams.set('languageCode', languageCode);
url.searchParams.set('voice', voice);
return url;
}
// Since the Stack Overflow code snippet doesn't have access to your server,
// here is a substitute function pointing to a public, static mp3 URL:
function createSpeechUrlForStackOverflow (...params: any[]): URL {
// A random doorbell audio sample I found on GitHub
const url = new URL('https://raw.githubusercontent.com/prof3ssorSt3v3/media-sample-files/65dbf140bdf0e66e8373fccff580ac0ba043f9c4/doorbell.mp3');
return url;
}
function playVoice (text: string): Promise<HTMLAudioElement> {
const languageCode = 'en-US';
const voice = 'NEUTRAL';
// const url = createSpeechUrl(text, languageCode, voice);
// Substitute for this SO code snippet:
const url = createSpeechUrlForStackOverflow(text, languageCode, voice);
// Instantiate the audio element with the source URL
// so that it can stream the audio data as early as possible
// (without waiting for the entire "file" to buffer)
const audio = new Audio(url.href);
// Return a promise with the result of attempting playback
// after enough streaming data has been downloaded
return new Promise<HTMLAudioElement>((resolve, reject) => audio.addEventListener(
'canplaythrough',
() => audio.play().then(() => resolve(audio)).catch(reject),
));
}
function App (): React.ReactElement {
return (<button onClick={() => playVoice('ding-dong')}>Play "ding-dong"</button>);
}
const reactRoot = ReactDOM.createRoot(document.getElementById('root')!);
reactRoot.render(
<React.StrictMode>
<App />
</React.StrictMode>
);
</script>