Search code examples
pythonmachine-learningnlpdatasetgoogle-translate

Google Translate for nlp Dataset


I have been trying to work on a NLP dataset and want to use the google translate for oversampling purpose. I have reduced the length of text in each row to 4000 characters but when I try to translate it shows error. I have also used the latest google translate pip install googletrans==4.0.0-rc1

df['Sentence'] = df['sentence'].str.slice(0,4000)
df['translation_text'] = df['Sentence'].apply(lambda x: translator.translate(x, src='en', dest='de').text )
df['sentence2'] = df['translation_text'].apply(lambda x: translator.translate(x, src='de', dest='en').text )

Error shows up every time - TypeError: the JSON object must be str, bytes or bytearray, not NoneType Now, I am confused as to what I am missing that these codes are not working


Solution

  • Hey guys the google trans 3.0.0a0 version works the best. Since the character limit is 5k characters just slice the strings and install this specific version and the translator will work well.