I'm following the guide here: https://huggingface.co/docs/transformers/v4.28.1/tasks/summarization There is one line in the guide like this:
labels = tokenizer(text_target=examples["summary"], max_length=128, truncation=True)
I don't understand the function of the text_target
parameter.
I tried the following code and the last two lines gave exactly the same results.
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('t5-small')
text = "Weiter Verhandlung in Syrien."
tokenizer(text_target=text, max_length=128, truncation=True)
tokenizer(text, max_length=128, truncation=True)
The docs just say text_target (str, List[str], List[List[str]], optional) — The sequence or batch of sequences to be encoded as target texts.
I don't really understand. Is there some situations when setting text_target
will give you a different result?
Sometimes it is necessary to look at the code:
if text is None and text_target is None:
raise ValueError("You need to specify either `text` or `text_target`.")
if text is not None:
# The context manager will send the inputs as normal texts and not text_target, but we shouldn't change the
# input mode in this case.
if not self._in_target_context_manager:
self._switch_to_input_mode()
encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
if text_target is not None:
self._switch_to_target_mode()
target_encodings = self._call_one(text=text_target, text_pair=text_pair_target, **all_kwargs)
# Leave back tokenizer in input mode
self._switch_to_input_mode()
if text_target is None:
return encodings
elif text is None:
return target_encodings
else:
encodings["labels"] = target_encodings["input_ids"]
return encodings
As you can see in the above snippet, both text
and text_target
are passed to self._call_one()
to encode them (note that text_target
is passed as the text
parameter). That means the encoding of the same string as text
or text_target
will be identical as long as _switch_to_target_mode()
doesn't do anything special.
The conditions at the end of the function answer your question:
text
you will retrieve the encoding of it.text_target
you will retrieve the encoding of it.text
and text_target
you will retrieve the encoding of text
and the token ids of text_target
as the value of the labels
key.To be honest, I think the implementation is a bit unintuitive. I would expect that passing the text_target
would return an object that only contains the labels
key. I assume that they wanted to keep their output objects and the respective documentation simple and therefore went for this implementation. Or there is a model where it actually makes sense that I am unaware of.