Search code examples
machine-learningtranslationmetricsevaluationbleu

Variation in BLEU Score


I have some question on BLUE Score calculation for machine translation. I realized they may have a different metrics for BLEU. I found the code reports five value for BLEU, namely BLEU-1, BLEU-2, BLEU-3, BLEU-4 and finally BLEU, which seems to be an exponential average of the previous four BLEUs. Still it is not clear to me what the difference between those is. Do you have any ideas? Thanks

P.s. At first I thought that this question is more of a theoretical content and posted it on meta stackexange. A moderator has closed and commented it as a stackoverflow type question . So please don't punish me again. =)


Solution

  • source: http://www.statmt.org/book/slides/08-evaluation.pdf

    I haven't heard of BLEU-1 and BLEU-2 but I guess it means 1-gram, 2-gram, 3-gram and 4-gram in the formula of BLEU score, I mean in the formula precision[i] = BLEU-i in your question:
    enter image description here