Search code examples
pythonjsonbotframeworkattachmentazure-bot-service

Can a user send attachment along-with message on Text prompt in a waterfall step?


Lets say we have following steps in our waterfall dialog:

self.add_dialog(TextPrompt(TextPrompt.__name__))
    self.add_dialog(
        WaterfallDialog(
            WaterfallDialog.__name__,
            [
                self.project_step,
                self.name_step,
                self.confirm_step,
                self.final_step,
            ],
        )
    )

async def project_step(
    self, step_context: WaterfallStepContext
) -> DialogTurnResult:
    """
    If a project name has not been provided, prompt for one.
    :param step_context:
    :return DialogTurnResult:
    """
    confluence_details = step_context.options

    if confluence_details.project is None:
        message_text = key.query_project_confluence_text.value + "?"
        prompt_message = MessageFactory.text(
            message_text, message_text, InputHints.expecting_input
        )
        return await step_context.prompt(
            TextPrompt.__name__, PromptOptions(prompt=prompt_message)
        )
    return await step_context.next(confluence_details.project)

If a user send an attachment along-with text to the bot at the prompt. Is it possible to get both in step_context.result.

In on_message_activity i could check using TurnContext.activity.attachments for attachment but how do i receive the same using Waterfall step_context and the Text message as well in subsequent step?

Request body will be as below:

{
    "text":"Hello there",
    "type":"message",
    "from":{
        "id":"xyz"
    },
    "attachments":{
        "contentType":"audio/wav",
        "name":"BabyElephantWalk60.wav",
        "contentUrl":"data:audio/wav;base64,UklGRvAEAgBXQVZFZm10IBAA
    }
}

Client side i.e iOS App will be using directline Api https://directline.botframework.com/v3/directline/conversations/EdWGs8IdmjNIy5j2E93EHW-a/activities to send the Activity.

iOS application is using speech kit.

On prompt, Whatever user speaks,the message along with audio file of it is to be sent to bot over directline in the request body as provided above.And,this will be done using mic button.

Is it possible to do so?


Solution

  • You seem to be treating this like a bot question when really it's more of a client question. Your bot can only respond to the activities it receives, so there won't be any way for your bot to handle an activity with both audio and text if the client never sends an activity with both audio and text. Since you're using your own Direct Line client, it's up to you to allow your client to send such an activity. Since audio files are normally very large, I recommend uploading the file rather than putting a data URL in the attachment.

    Ordinarily, the user would send an attachment and text as separate activities on separate turns. The bot would handle the data on those separate turns by keeping track of a dialog in its state, and it would probably be a waterfall dialog. It sounds like you don't want to do this because the text and the attachment are really the same data in your case.

    On the bot side, you can access the text and attachments of the activity outside of any dialog if you like. You also have direct access to the activity inside a waterfall step following any prompt, because the step context contains the turn context which contains the activity.

    text = step_context.context.activity.text
    attachments = step_context.context.activity.attachments
    

    You can do this with a text prompt or an attachment prompt. If you want to be able to access those things in step_context.result, you could make your own prompt that puts the whole activity in the result. You can use ActivityPrompt as a base class since it was made for that purpose.

    Besides sending separate activities, another alternative to sending the text and the audio in the same activity would be to send just the audio and have the bot convert the audio into text using Cognitive Speech Services. This probably wouldn't be ideal because then your client wouldn't be able to display the text since it's not doing the conversion on its end. I'm assuming you are having the having the user provide audio input with the microphone and then converting it into text rather than having the user enter text and then converting it into audio.

    Direct Line Speech is a builtin way of leveraging Cognitive Speech Services so that both your client and bot can access the text. And depending on your needs, you might consider looking into Web Chat speech.