I have a situation where I want to apply a translation model to each and every row in one of data frame columns.
The translation code that I am using :
from transformers import FSMTForConditionalGeneration, FSMTTokenizer
mname = "allenai/wmt19-de-en-6-6-big"
tokenizer = FSMTTokenizer.from_pretrained(mname)
model = FSMTForConditionalGeneration.from_pretrained(mname)
#Loop here for all rows in the German_Text column
input_ids = tokenizer.encode(input, return_tensors="pt")
outputs = model.generate(input_ids)
decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(decoded)
I want to apply this model to the following column and create a new translated column post this:
German_Text English_Text
Wie geht es dir heute
mir geht es gut
The column English text will consist of the translated text from the model above and hence I would like to apply that model to each row in the German_text column to create corresponding translations in the English_Text column
All you need to do is to wrap the steps into a function and use the apply function of your dataframe:
import pandas as pd
from transformers import FSMTForConditionalGeneration, FSMTTokenizer
mname = "allenai/wmt19-de-en-6-6-big"
tokenizer = FSMTTokenizer.from_pretrained(mname)
model = FSMTForConditionalGeneration.from_pretrained(mname)
df = pd.DataFrame(['Wie geht es dir heute', 'mir geht es gut'], columns=['German_Text'])
def translationPipeline(text):
input_ids = tokenizer.encode(text, return_tensors="pt")
outputs = model.generate(input_ids)
decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
return decoded
df['English_Text']=df['German_Text'].apply(translationPipeline)
print(df)
Output:
German_Text English_Text
0 Wie geht es dir heute How are you doing today
1 mir geht es gut I'm fine