Search code examples
dialogflow-esactions-on-googlealexa-skills-kit

Intent sample utterance (training phrase) structure - which is best?


3/28/19 update: Nick from Google did a great job of answering from the DialogFlow side. It would be great to get an answer from the ASK team as well!


We have a voice app available as both an Alexa skill and a Google Action (with DialogFlow). With both ASK and DialogFlow, we have an intent with a sample utterance / training phrase structure of:

leading carrier phrase {main slot} trailing phrase

There are many leading phrases, and many trailing phrases. The phrases are short. There are a maximum of 6 words in these phrases, and most have 3 words. Not all sample utterances have both a leading and trailing phrase. There are 100+ sample utterances with these combinations.

I'm wondering if we should create new slot values of {leading phrase} and {trailing phrase}. Then populate each of them with the respective phrases that are currently in the sample utterances. Then change the sample utterances from the current 100+ to only these 4:

{main slot}

{leading phrase} {main slot}

{main slot} {trailing phrase}

{leading phrase} {main slot} {trailing phrase}

I think we'll get better logging this way, and it seems cleaner. But I am nervous about it. What is the expected impact to the accuracy of the NLU for making this change to both platforms? Better? Worse? What is the best practice recommendation for this with ASK? What is the best practice recommendation for DialogFlow?


@Nick - Thanks for the answer below. Let me paraphrase and see if I understand - if entities are used exclusively, the impact is:

1) if the user speaks an exact match for a combination of known entities, recognition can be better.

2) if a user speaks a phrase that does not match known entities, the intent won't be matched as strongly compared to the same situation when entities are not used. This could result in the intent not being selected.

Is this correct?

I'm not sure about the case where the utterance is not an exact match to the combination of entities, but it is close. Is the result that (compared to when entities are not used) the intent will be matched less frequently? Maybe you can clarify your statement "If the user is going to prefix or suffix a phrase, and it's irrelevant, it will help Dialogflow's ML matching by fuzzy matching the intent based on similar phrases."

Maybe an example would be helpful. Let's compare these two training phrases:

'tell me about {main slot} with french fries'

{leading slot} {main slot} {trailing slot}

Where:

{main slot} contains 'hamburger'

{leading slot} contains 'tell me about' but does NOT contain 'tell us about'

{trailing slot} contains 'with french fries'

Now let's say the user utterance is "tell us about hamburger with french fries". Is a match to this intent more or less likely using slots/entities?


Solution

  • If you define everything in entities, it can make it more accurate as Dialogflow will implement some biasing to identify the right entity (or ignore this intent entirely).

    However, using an entity really only needs to be used for vocabulary that you're actually interested in. If the user is going to prefix or suffix a phrase, and it's irrelevant, it will help Dialogflow's ML matching by fuzzy matching the intent based on similar phrases.

    Based on your question, it seems like you actually do care about the phrases to some extent, so using entities could be a good fit, and would be easier to maintain than 100 separate training phrases.