I have the following codes:
import evaluate
reference1 = "犯人受到了嚴密的監控。" # Ground Truth
hypothesis1 = "犯人受到嚴密監視。" # Translated Sentence
meteor = metric_meteor.compute(predictions=[hypothesis1], references=[reference1])
print("METEOR:", meteor["meteor"])
It returns 0.0
.
My question: How can I make the above code produce the same score as the below codes?
However, with NLTK, the score is 98.14814814814815
:
from nltk.translate.meteor_score import single_meteor_score
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('fnlp/bart-base-chinese')
tokenized_reference1 = tokenizer(reference1)
tokenized_hypothesis1 = tokenizer(hypothesis1)
print("METEOR:", single_meteor_score(tokenized_reference1, tokenized_hypothesis1) * 100)
From the Evaluate's METEOR implementation, it's actually an NLTK wrapper: https://huggingface.co/spaces/evaluate-metric/meteor/blob/main/meteor.py
The problem is that meteor.py
uses word_tokenize
as tokenizer and there doesn't seem to be a way to pass your tokenizer as argument (you could file a feature request so that the author adds it). You can however patch the tokenizer when creating metric_meteor
:
import evaluate
from unittest.mock import patch
import nltk
from transformers import AutoTokenizer
reference1 = "犯人受到了嚴密的監控。" # Ground Truth
hypothesis1 = "犯人受到嚴密監視。" # Translated Sentence
tokenizer = AutoTokenizer.from_pretrained('fnlp/bart-base-chinese')
with patch.object(nltk, 'word_tokenize', tokenizer):
metric_meteor = evaluate.load('meteor')
meteor = metric_meteor.compute(predictions=[hypothesis1], references=[reference1], alpha=0.9, beta=3.0, gamma=0.5)
print("METEOR:", meteor["meteor"])
Output:
METEOR: 0.9814814814814815