Search code examples
canvasdialogflow-esactions-on-googleinteractiverasa

Google Interactive Canvas with rasa


As Dialogflow is closed Source and hosted on Googles Server, is it possible to use interactive Canvas with e.g. RASA, or other alternatives? I was going through the tutorial of interactive canvas and always had to deploy on firebase or google cloud, because thats what the tutorial said and because services running locally arent reachable from within Dialogflow.

I want to deploy the fullfillment and webapp at home for myself and not running on a cloud system.


Solution

  • You have a bunch of related questions here, but it may be easier to understand if we break them down into smaller ones.

    Do I need to use Dialogflow to write an Action for the Assistant?

    No. Actions on Google defines the Actions SDK which you can use to define where it sends the results of the speech-to-text (STT) processing it will do on what the user says, and where you should send the response you want to the user. How you process that text is up to you, but a Natural Language Understanding/Processing system (NLP/NLU) such as RASA is strongly suggested.

    Does this need to run on Firebase or Google Cloud?

    No. It does need to run somewhere, but the only requirements for it are

    • You must have a public HTTPS URL endpoint (a webhook) that Google will use to send the STT message.
    • You must be able to accept a POST at that URL with JSON in a format defined by the Actions SDK. You are expected to return JSON that meets the response format defined by the Actions SDK.

    While Firebase Cloud Functions or other Google Cloud solutions work well, you can run it anywhere that meets these requirements. AWS also works, for example.

    Can I run this on my own network?

    It depends.

    If your network is a public network with a public IP address (even without a public DNS entry) - then yes.

    If you're on a private network - then... maybe, but you need to do more work. If there is a public IP address available, you might be able to create a proxy to your machine for inbound connections. Alternately, you could use tools such as ngrok or other methods to create a secure public URL endpoint that tunnels to your local machine. (One advantage of ngrok is that they also take care of the HTTPS requirement.)

    Once I do this, can I get an Action just for myself?

    Not really.

    You can certainly run an Action in "development mode", but you will need to refresh it periodically in the test console. Similarly, you can add users to it for an alpha release. In both cases, however, the experience makes it clear that you're testing it.

    If you want it to work "just like other Actions", then you will need to submit it for review which will make it public. To prevent others from using it, you would need to include Google Sign-In to limit who can access your Action.

    What about the Interactive Canvas part?

    The Interactive Canvas adds an additional requirement onto everything above. You still need to have Google handle what the user says, which it would pass to your Action, which you can then pass along to the Interactive Canvas along with a reply.

    One advantage, however, is that the Interactive Canvas part can also run independently of whatever the Action is doing. So there can be local code that runs on Android or the Smart Display that can do some things, including reacting directly to touch or timing events.

    When you say "local code", what do you mean?

    The code for your Action has to run at a webhook. But the code for the Interactive Canvas runs on the device itself.

    Where is the code for the Interactive Canvas loaded from? Do I need to install an app?

    No. The Interactive Canvas is a web page, and it is loaded from a URL.

    When your Action initiates an Interactive Canvas, it sends the URL to be loaded to your device. Your device then loads it from this URL and then treats it just like most other web pages (with some limitations). It is suggested that you use a single-page webapp, but this isn't required.

    As a single-page webapp, can it make API calls back to the web server? Or other web servers?

    Yes, but keep in mind that CORS restrictions may limit this. The Interactive Canvas runs inside an iframe, and sets the CORS origin to null. If the resources you're loading don't allow this - the calls may be rejected.

    But if you have an API that is accessible locally to the device where the Interactive Canvas part is running, then you should be able to access it from the Interactive Canvas script.

    That sounds like one of the limitations you mentioned. Are there others?

    Yes, there are a few others, too. The biggest ones are that you don't have access to local storage or cookies, and you don't have access to hardware such as cameras and geolocation. You also don't have access to the Web Speech API SpeechRecognition interface.

    Wait, if I don't have access to storage or cookies, how can I handle things between invocations?

    You'll need to use the Action features to save data across conversations.

    And I don't have access to SpeechRecognition? Isn't that kinda silly for a Smart Display Action?

    I didn't say that. You can still do speech recognition using the features that are available through Actions. Anything spoken while the microphone is open is sent to Google for STT, and then sent to your Action.

    If it gets sent to the Action, how can it get sent to the Interactive Canvas?

    As part of the response, you can send data from the Action to the Interactive Canvas. Your Interactive Canvas script can register to handle an onUpdate() callback.

    Updated: What if I don't want to run anything through the Interactive Canvas, but just display something?

    You have a few choices.

    If you just need to present some text, you don't need to do anything at all - just send back a text response from your Action and it shows up on the screen.

    If you need something slightly more complicated, like text and a static image, you can send back a card. There is also a table card if you just need a table.

    But if you want control over the entire screen, you can use Interactive Canvas to just send back HTML. You can format this HTML however you want - either as a fully static page, as a page generated on your website, or as a page generated from client-side JavaScript. You can even use CSS to format it - it's a perfectly normal HTML page.

    Updated: Why wouldn't I use an Interactive Canvas, then?

    There are a bunch of good reasons why you might not want to go through the trouble of an Interactive canvas, but here are a few basic ones:

    • You can only get your Interactive Canvas Action approved if this is a game right now. (If you're only doing this privately, this isn't a concern, of course.) Other types may be allowed soon - but right now, just games.
    • You don't need to use the full screen. Just presenting some data? Text or a Card may be good enough.
    • You want to make sure this also works on a Smart Speaker or "eyes free". Even for ones where you do use the screen, you may want to make sure your users can use it without the screen.