How can I implement meteor score when evaluating a model when using the meteor_score module from nltk?

I currently have 2 files, reference.txt and model.txt. These two text files contain original captions and generated captions after training.
Can I simply do the following to obtain the meteor score:

score = nltk.translate.meteor_score.meteor_score(reference, model)
print(np.mean(meteor_score))

I have also looked to https://github.com/tylin/coco-caption but I have no clue how to implement this.

Solution

Lets start by defining terms

Reference: The actual text/ground truth. If there are multiple people generating the ground truth for same datapoint you will have multiple references and all of them are assumed to be correct

hypothesis: The candidate/predicted.

Lets say the 2 people look at an image and they caption

this is an apple
that is an apple

Now your model looked at the image and predicted

an apple on this tree

You can calculate the meteor_score of how good the prediction was using

print (nltk.translate.meteor_score.meteor_score(
    ["this is an apple", "that is an apple"], "an apple on this tree"))
print (nltk.translate.meteor_score.meteor_score(
    ["this is an apple", "that is an apple"], "a red color fruit"))

Output:

0.6233062330623306
0.0

In your case you have to read reference.txt into a list and similarly model predicitons into another. Now you have to get the meteor_score for each line in the first list with the each line in the second list and finally take a mean.