from nltk.translate.bleu_score import sentence_bleu
reference = [['this', 'is', 'ae', 'test']]
candidate = ['this', 'is', 'ad', 'test']
score = sentence_bleu(reference, candidate)
print(score)
I am using this code to calculate the BLEU score and the score I am getting is 1.0547686614863434e-154
. I wander why I am getting so small value even only one letter is different in candidate list.
score = sentence_bleu(reference, candidate,weights = [1])
I tried adding weight = [1] as a parameter and it gave me 0.75
as output. I cant understand why I have to add weight to get a reasonable result. Any help would be appreciated.
I thought its maybe because the sentence is not long enough so I added more words:
from nltk.translate.bleu_score import sentence_bleu
reference = [['this', 'is', 'ae', 'test','rest','pep','did']]
candidate = ['this', 'is', 'ad', 'test','rest','pep','did']
score = sentence_bleu(reference, candidate)
print(score)
Now I am getting 0.488923022434901
but still I think is too low value.
By default, sentence_bleu
is configured with 4 weights: 0.25 for unigram, 0.25 for bigram, 0.25 for trigram, 0.25 for quadrigram. The length of weights
give the order of ngram, so the BLEU score is computed for 4 levels of ngrams.
When you use weights=[1]
, you only analyze unigram:
reference = [['this', 'is', 'ae', 'test','rest','pep','did']]
candidate = ['this', 'is', 'ad', 'test','rest','pep','did']
>>> sentence_bleu(reference, candidate) # default weights, order of ngrams=4
0.488923022434901
But you can also consider unigrams are more important than bigrams which are more important than tri and quadigrams:
>>> sentence_bleu(reference, candidate, weights=[0.5, 0.3, 0.1, 0.1])
0.6511772622175621
You can also use SmoothingFunction
methods and read the docstring from source code to better understanding.