Why scarebleu
needs that sentences ends with dot? If I remove dots, the value is zero.
import sacrebleu, nltk
sys = ["This is cat."]
refs = [["This is a cat."],
["This is a bad cat."]]
b3 = sacrebleu.corpus_bleu(sys, refs)
print("b3", b3.score)
print("b3", round(b3.score,2))
This returns the following:
b3 35.1862973998119
b3 35.19
When I remove the ending dots.
sys = ["This is cat"]
refs = [["This is a cat"],
["This is a bad cat"]]
b3 = sacrebleu.corpus_bleu(sys, refs)
print("b3", b3.score)
print("b3", round(b3.score,2))
It prints zero using scarebleu which is again weird!:
b3 0.0
b3 0.0
BLEU is defined as a geometrical average of (modified) n-gram precisions for unigrams up to 4-grams (times brevity penalty). Thus if there is no matching 4-gram (no 4-tuple of words) in the whole test set, BLEU is 0 by definition. having a dot at the end which will get tokenized, makes it so that that there are now matches for 4-grams because smoothing is applied.
BLEU was designed for scoring test sets with hundreds of sentences where such case is very unlikely. For scoring single sentences, you can use a sentence-level version of BLEU which uses some kind of smoothing, but the results are still not ideal. You can also use a character-based metric, e.g. chrF (sacrebleu -m chrf
).
You can also pass use_effective_order=True
to corpus_bleu so that only the matched n-gram orders are counted instead of 4 n-grams. However, in that case, the metric is not exactly what people would refer to BLEU.