Search code examples
pythonhuggingface-transformers

How do I get a pretrained model from hugging face running on my own data?


I found a pretrained model in this repository: https://github.com/causalNLP/logical-fallacy and I want to get it running on my own data locally.

In the description it says:

import transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model =  AutoModelForSequenceClassification.from_pretrained('path_to_saved_model', num_labels=3)
tokenizer = AutoTokenizer.from_pretrained('path_to_tokenizer', do_lower_case=True)

what is meant by path_to_saved_model and path_to_tokenizer?


Solution

  • I recommend you read the documentation provided in the hugging face website.

    To answer your question for Auto tokenizers 'path_to_saved_model` stands for :

    pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a predefined tokenizer hosted inside a model repo on huggingface.co. Valid model ids can be located at the root-level, like bert-base-uncased, or namespaced under a user or organization name, like dbmdz/bert-base-german-cased.
    • A path to a directory containing vocabulary files required by the tokenizer, for instance saved using the save_pretrained() method, e.g., ./my_model_directory/. A path or url to a single saved vocabulary file if and only if the tokenizer only requires a single vocabulary file (like Bert or XLNet), e.g.: ./my_model_directory/vocab.txt. (Not applicable to all derived classes)

    Same thing for AutoModelForSequenceClassification