Search code examples
pythontensorflowmetricsrasa-nlu

Rasa NLU: Confidence Score Computation


I was trying to understand what the confidences score outputted by rasa nlu(ver-0.12.3) actually are and how they are computed.

I have been working on intent classification task with tensorflow embedding. Once my model is trained and I parse new/test data, I receive a confidence score along with each probable intent. But I have little to no idea of what actually this confidence score represents.

As mentioned in docs, it does not represent probability. And after some observation of results, it seems to be a one-many type evaluation i.e. for a single text input, I can get multiple intents with high confidence scores.

After having a quick look at the code, I think it is computed in “_tf_sim” function in “embedding_intent_classifier.py” file (Relevant code segment below)

Can somebody please confirm/clarify on how it works or what exactly a confidence score means here?

    def _tf_sim(self, a, b):
    """Define similarity"""

    if self.similarity_type == 'cosine':
        a = tf.nn.l2_normalize(a, -1)
        b = tf.nn.l2_normalize(b, -1)

    if self.similarity_type == 'cosine' or self.similarity_type == 'inner':
        sim = tf.reduce_sum(tf.expand_dims(a, 1) * b, -1)

        # similarity between intent embeddings
        sim_emb = tf.reduce_sum(b[:, 0:1, :] * b[:, 1:, :], -1)

        return sim, sim_emb
    else:
        raise ValueError("Wrong similarity type {}, "
                         "should be 'cosine' or 'inner'"
                         "".format(self.similarity_type))

Solution

  • The intent classifier intent_classifier_tensorflow_embedding (docs) is an approach based on the StarSpace paper from Facebook.

    In this approach both, intent examples and their labels, are embedded in the same multidimensional space. So the intent feature vector, and the label of this sentence are each multiplied with a weight matrix, which maps them to an n-dimensional space. During the training these two weight matrices are adjusted, so that the mapped intent vector is similar to the mapped vector of its label and as different as possible to the other labels. Hereby, similarity is measured using the cosine similarity. This similarity is 1 if the vectors point in the same direction, for other angles < 1.