I am using the HuggingFace Evaluate library to evaluate my results using 2 metrics. Here are the codes:
import evaluate
metric = evaluate.combine(
["sacrebleu", "chrf"], force_prefix=True
)
And in the compute_metrics()
function, here is how I call the metric.compute()
:
def compute_metrics(eval_preds):
preds, labels = eval_preds
if isinstance(preds, tuple):
preds = preds[0]
decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
decoded_preds, decoded_labels = postprocess_text(decoded_preds, decoded_labels)
result = metric.compute(predictions=decoded_preds, references=decoded_labels)
results = {"bleu": result["sacrebleu_score"], "chrf": result["chr_f_score"]}
prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds]
results["gen_len"] = np.mean(prediction_lens)
results = {k: round(v, 4) for k, v in results.items()}
return results
However, I would like to specify the chrF to use word_order=2
. How can I do so? Thanks.
I had this same question, and I spent some time looking through the evaluate
source code for combine
and compute
. There doesn't seem to be a way to pass metric-specific parameters to a combined compute()
function.
compute()
accepts **kwargs
, but whatever additional arguments you specify in metric.compute()
will be passed to each of the metrics you combined. So in your case, passing word_order=2
will lead to an error because sacrebleu
does not have a word_order
parameter.
I resorted to just calling the metrics separately within my compute_metrics
function, and then aggregating the results.