Search code examples
pythonnltkmetricsmachine-translation

I compare two identical sentences with RIBES NLTK and get an error. Why?


I’m trying to use RIBES score from NLTK for quality evaluation of the machine translation. I wanted to check this code with two identical sentences. But when I’m running my code I get errors.

My code:

from nltk.translate.ribes_score import sentence_ribes

hyp1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which', 'ensures', 'that', 'the', 'military', 'always', 'obeys', 'the', 'commands', 'of', 'the', 'party']

ref1a = ['It', 'is', 'a', 'guide', 'to', 'action', 'which', 'ensures', 'that', 'the', 'military', 'always', 'obeys', 'the', 'commands', 'of', 'the', 'party']

ribes_score = sentence_ribes(ref1a, hyp1)

print(ribes_score)
 

Errors:

Traceback (most recent call last):

  File "D:/Users/anastasia.emelyanova/PycharmProjects/Metrics_NLTK/ribes_test.py", line 4, in <module>

    ribes_score = sentence_ribes(ref1a, hyp1)

  File "D:\Users\anastasia.emelyanova\AppData\Local\Programs\Python\Python38\lib\site-packages\nltk\translate\ribes_score.py", line 55, in sentence_ribes

    nkt = kendall_tau(worder)

  File "D:\Users\anastasia.emelyanova\AppData\Local\Programs\Python\Python38\lib\site-packages\nltk\translate\ribes_score.py", line 290, in kendall_tau

    tau = 2 * num_increasing_pairs / num_possible_pairs - 1

ZeroDivisionError: division by zero


Process finished with exit code 1

Why I’m getting these errors? Am I mistaken? I just took two identical sentences and there shouldn’t be division by zero, because numbers of possible pairs should be more than 1. Two identical sentences should get a score 1.0. I'm coding on Python 3, Windows 7, in PyCharm. Please help!


Solution

  • You're getting to a division by zero on this line:

    tau = 2 * num_increasing_pairs / num_possible_pairs - 1
    

    This is because num_possible_pairs is 0 when len(worder) is 1. All of this is because you're calling sentence_ribes with two lists, when the first parameter should be a list of lists (a list of sentences where each sentence is a list of words).

    Try calling it like this instead:

    ribes_score = sentence_ribes([ref1a], hyp1)