Search code examples
pythonpython-3.xanacondapytorchhuggingface-transformers

Hugginface transformers module not recognized by anaconda


I am using Anaconda, python 3.7, windows 10.

I tried to install transformers by https://huggingface.co/transformers/ on my env. I am aware that I must have either pytorch or TF installed, I have pytorch installed - as seen in anaconda navigator environments.

I would get many kinds of errors, depending on where (anaconda / prompt) I uninstalled and reinstalled pytorch and transformers. Last attempt using conda install pytorch torchvision cpuonly -c pytorch and conda install -c conda-forge transformers I get an error:

from transformers import BertTokenizer
bert_tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)

def tok(dataset):
    input_ids = []
    attention_masks = []
    sentences = dataset.Answer2EN.values
    labels = dataset.Class.values
    for sent in sentences:
        encoded_sent = bert_tokenizer.encode(sent, 
                                             add_special_tokens=True,
                                             max_length = 64,
                                             pad_to_max_length =True)

TypeError: _tokenize() got an unexpected keyword argument 'pad_to_max_length'

Does anyone know a secure installation of transformers using Anaconda? Thank you


Solution

  • The problem is that conda only offers the transformers library in version 2.1.1 (repository information) and this version didn't have a pad_to_max_length argument. I'm don't want to look it up if there was a different parameter, but you can simply pad the result (which is just a list of integers):

    from transformers import BertTokenizer
    bert_tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)
    
    sentences = ['this is just a test', 'this is another test']
    
    max_length = 64
    
    for sent in sentences:
        encoded_sent = bert_tokenizer.encode(sent, 
                                             add_special_tokens=True,
                                             max_length = max_length)
        encoded_sent.extend([0]* (max_length - len(encoded_sent)))
    
        ###your other stuff
    

    The better option in my opinion is to create a new conda environment and install everything via pip and not via conda. This will allow you to work with the most recent transformers version (2.11).