Search code examples
python-3.xpandasloopshuggingface-transformershuggingface-tokenizers

Apply transformer model to each row in a pandas column


I have a situation where I want to apply a translation model to each and every row in one of data frame columns.

The translation code that I am using :

from transformers import FSMTForConditionalGeneration, FSMTTokenizer
mname = "allenai/wmt19-de-en-6-6-big"
tokenizer = FSMTTokenizer.from_pretrained(mname)
model = FSMTForConditionalGeneration.from_pretrained(mname)
#Loop here for all rows in the German_Text column

input_ids = tokenizer.encode(input, return_tensors="pt")
outputs = model.generate(input_ids)
decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(decoded)

I want to apply this model to the following column and create a new translated column post this:

German_Text                     English_Text
Wie geht es dir heute
mir geht es gut

The column English text will consist of the translated text from the model above and hence I would like to apply that model to each row in the German_text column to create corresponding translations in the English_Text column


Solution

  • All you need to do is to wrap the steps into a function and use the apply function of your dataframe:

    import pandas as pd
    from transformers import FSMTForConditionalGeneration, FSMTTokenizer
    
    mname = "allenai/wmt19-de-en-6-6-big"
    tokenizer = FSMTTokenizer.from_pretrained(mname)
    model = FSMTForConditionalGeneration.from_pretrained(mname)
    
    df = pd.DataFrame(['Wie geht es dir heute', 'mir geht es gut'], columns=['German_Text'])
    
    def translationPipeline(text):
        input_ids = tokenizer.encode(text, return_tensors="pt")
        outputs = model.generate(input_ids)
        decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
        return decoded
    
    df['English_Text']=df['German_Text'].apply(translationPipeline)
    print(df)
    

    Output:

                 German_Text             English_Text
    0  Wie geht es dir heute  How are you doing today
    1        mir geht es gut                 I'm fine