Search code examples
pythonnlpnltkmetrics

How can I implement meteor score when evaluating a model when using the meteor_score module from nltk?


I currently have 2 files, reference.txt and model.txt. These two text files contain original captions and generated captions after training.
Can I simply do the following to obtain the meteor score:

score = nltk.translate.meteor_score.meteor_score(reference, model)
print(np.mean(meteor_score))

I have also looked to https://github.com/tylin/coco-caption but I have no clue how to implement this.


Solution

  • Lets start by defining terms

    Reference: The actual text/ground truth. If there are multiple people generating the ground truth for same datapoint you will have multiple references and all of them are assumed to be correct

    hypothesis: The candidate/predicted.

    Lets say the 2 people look at an image and they caption

    • this is an apple
    • that is an apple

    Now your model looked at the image and predicted

    • an apple on this tree

    You can calculate the meteor_score of how good the prediction was using

    print (nltk.translate.meteor_score.meteor_score(
        ["this is an apple", "that is an apple"], "an apple on this tree"))
    print (nltk.translate.meteor_score.meteor_score(
        ["this is an apple", "that is an apple"], "a red color fruit"))
    

    Output:

    0.6233062330623306
    0.0
    

    In your case you have to read reference.txt into a list and similarly model predicitons into another. Now you have to get the meteor_score for each line in the first list with the each line in the second list and finally take a mean.