node.js google-cloud-platform grpc audio-streaming dialogflow-cx

Google Dialogflow CX | StreamingDetectIntent doesn't process audio after matching first intent

Environment details

OS: Windows 10, 11. Debian 9 (stretch)
Node.js version: 12.18.3, 12.22.1
npm version: 7.19.0, 7.15.0
@google-cloud/dialogflow-cx version: 2.13.0

Issue

StreamingDetectIntent doesn't process audio after matching the first intent. I am able to see the transcription and it is able to match the first intent but after matching the first intent, the audio keeps on streaming but I receive no transcription, and on('data') callback is also not triggered. In short, nothing happens after matching the first intent

One thing that worked around it was that I've to end the detectStream and then reinitialize it. Then it worked as expected.

Steps to reproduce

I've tried with const {SessionsClient} = require("@google-cloud/dialogflow-cx"); and const {SessionsClient} = require("@google-cloud/dialogflow-cx").v3;

// Create a stream for the streaming request.
const detectStream = client
    .streamingDetectIntent()
    .on('error', console.error)
    .on('end', (data)=>{
        console.log(`streamingDetectIntent: -----End-----: ${JSON.stringify(data)}`);
    })
    .on('data', data => {
        console.log(`streamingDetectIntent: Data: ----------`);
        if (data.recognitionResult) {
            console.log(`Intermediate Transcript: ${data.recognitionResult.transcript}`);
        } else {
            console.log('Detected Intent:');
            if(!data.detectIntentResponse) return
            const result = data.detectIntentResponse.queryResult;

            console.log(`User Query: ${result.transcript}`);
            for (const message of result.responseMessages) {
                if (message.text) {
                    console.log(`Agent Response: ${message.text.text}`);
                }
            }
            if (result.match.intent) {
                console.log(`Matched Intent: ${result.match.intent.displayName}`);
            }
            console.log(`Current Page: ${result.currentPage.displayName}`);
        }
    });

const initialStreamRequest = {
        session: sessionPath,
        queryInput: {
            audio: {
                config: {
                    audioEncoding: encoding,
                    sampleRateHertz: sampleRateHertz,
                    singleUtterance: true,
                },
            },
            languageCode: languageCode,
        }
    };
detectStream.write(initialStreamRequest);

I've tried streaming audio via files (.wav) & using the microphone but resulted in the same behavior.

await pump(
        recordingStream, // microphone stream <OR> fs.createReadStream(audioFileName),
        // Format the audio stream into the request format.
        new Transform({
            objectMode: true,
            transform: (obj, _, next) => {
                next(null, {queryInput: {audio: {audio: obj}}});
            },
        }),
        detectStream
    );

I've referred to this implementation and this rpc based doc as well but couldn't found any reason as to why this should not work.

Thanks!

Solution

This seems to be proper behavior according to the documentation:

When Dialogflow detects the audio's voice has stopped or paused, it ceases speech recognition and sends a StreamingDetectIntentResponse with a recognition result of END_OF_SINGLE_UTTERANCE to your client. Any audio sent to Dialogflow on the stream after receipt of END_OF_SINGLE_UTTERANCE is ignored by Dialogflow.

So it seems that's why the StreamingDetectIntent doesn't process audio after matching the first intent. According to the same documentation:

After closing a stream, your client should start a new request with a new stream as needed

You should start another stream. You can check other github issue in the same topic as well.