I trained a multitask model using allennlp 2.0 and now want to predict on new examples using the allennlp predict
command.
Problem/Error:
I am using the following command: allennlp predict results/model.tar.gz new_instances.jsonl --include-package mtl_sd --predictor mtlsd_predictor --use-dataset-reader --dataset-reader-choice validation
This gives me the following error:
Traceback (most recent call last):
File ".../mtl_sd_venv/bin/allennlp", line 10, in <module>
sys.exit(run())
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/__main__.py", line 34, in run
main(prog="allennlp")
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/commands/__init__.py", line 119, in main
args.func(args)
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/commands/predict.py", line 220, in _predict
manager.run()
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/commands/predict.py", line 186, in run
for batch in lazy_groups_of(self._get_instance_data(), self._batch_size):
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/common/util.py", line 139, in lazy_groups_of
s = list(islice(iterator, group_size))
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/commands/predict.py", line 180, in _get_instance_data
yield from self._dataset_reader.read(self._input_file)
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/data/dataset_readers/multitask.py", line 31, in read
raise RuntimeError("This class is not designed to be called like this")
RuntimeError: This class is not designed to be called like this
As far as I understand, that's what's going on:
This RuntimeError is raised by the MultiTaskDatasetReader because the read()
-method of the MultiTaskDatasetReader should not be called itself. The read()
-method should only be called for specific DatasetReaders in MultiTaskDatasetReader.readers
.
The read()-method of the MultiTaskDatasetReader is called because in the jsonnet-config I have specified the DatasetsReaders as follows:
"dataset_reader": {
"type": "multitask",
"readers": {
"SemEval2016": {
"type": "SemEval2016",
"max_sequence_length": 509,
"token_indexers": {
"bert": {
"type": "pretrained_transformer",
"model_name": "bert-base-cased"
}
},
"tokenizer": {
"type": "pretrained_transformer",
"model_name": "bert-base-cased"
}
}, ...
}
}
Usually the type
of dataset_reader indicates the dataset-reader class to be instanciated for prediction. But in this case the type
just points MultiTaskDatasetReader, which has no read()
-method implemented and contains multiple DatasetReaders.
As far as I understand, when using allennlp predict
I need to specify somehow which of the multiple DatasetReaders should be used.
The questions is:
How can I specify which specific DatasetReader (of the multiple DatasetReaders in MultiTaskDatasetReader.readers
) should be used when executing allennlp predict
? Or more generally: How can I get allennlp predict
to run with a MultiTaskDatasetReader?
Additional code, for the sake of completeness: The predictor:
@Predictor.register('mtlsd_predictor')
class MTLSDPredictor(Predictor):
def predict(self, sentence: str) -> JsonDict:
return self.predict_json({'sentence': sentence})
@overrides
def _json_to_instance(self, json_dict: JsonDict) -> Instance:
target = json_dict['text1']
claim = json_dict['text2']
return self._dataset_reader.text_to_instance(target, claim)
There are two issues here. One is a bug in AllenNLP that is fixed in version 2.1.0. The other one is that @sinaj was missing the default_predictor
in his model head.