Search code examples
pythonbleuhuggingface-evaluatechrf

How to specify additional parameters when using HuggingFace Evaluate's evaluate.combine() method?


I am using the HuggingFace Evaluate library to evaluate my results using 2 metrics. Here are the codes:

import evaluate

metric = evaluate.combine(
    ["sacrebleu", "chrf"], force_prefix=True
)

And in the compute_metrics() function, here is how I call the metric.compute():

def compute_metrics(eval_preds):
    preds, labels = eval_preds
    if isinstance(preds, tuple):
        preds = preds[0]
    decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)

    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

    decoded_preds, decoded_labels = postprocess_text(decoded_preds, decoded_labels)

    result = metric.compute(predictions=decoded_preds, references=decoded_labels)
    
    results = {"bleu": result["sacrebleu_score"], "chrf": result["chr_f_score"]}

    prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds]
    results["gen_len"] = np.mean(prediction_lens)
    results = {k: round(v, 4) for k, v in results.items()}
    return results

However, I would like to specify the chrF to use word_order=2. How can I do so? Thanks.


Solution

  • I had this same question, and I spent some time looking through the evaluate source code for combine and compute. There doesn't seem to be a way to pass metric-specific parameters to a combined compute() function.

    compute() accepts **kwargs, but whatever additional arguments you specify in metric.compute() will be passed to each of the metrics you combined. So in your case, passing word_order=2 will lead to an error because sacrebleu does not have a word_order parameter.

    I resorted to just calling the metrics separately within my compute_metrics function, and then aggregating the results.