Resolve See in the end of this post for the solution
Good evening.
Im trying to play with the google translate v3 api.
And I arrive on a mystical encoding issue.
I do this :
def translate_text_langueTarget(texteToTranslate, langueTarget):
parent = client.location_path(project_id, location)
langueOrigin = detect_language(texteToTranslate)
if (langueOrigin == "en" and langueTarget == "en"):
return(texteToTranslate)
try:
response = client.translate_text(
parent=parent,
contents=[texteToTranslate],
mime_type='text/plain',
source_language_code=langueOrigin,
target_language_code=langueTarget)
translatedTexte = str(response.translations)[19:-3]
except:
translatedTexte = "Sorry my friend, the translation is lost on the internet"
print(response)
print(type(response))
print(response.translations)
print(type(response.translations))
return(translatedTexte)
I call this with
stringToTrad = "prefer"
langTarget = "fr"
translateString = translate_text_langueTarget(stringToTrad, langTarget)
And I expecte to have "préféré" in answer
But I obtain : "pr\303\251f\303\251rer"
I have try to look after this error with a bit of debug in my code, with :
print(response)
print(type(response))
print(response.translations)
print(type(response.translations))
I think it's a problem of encoding but i can't find a answer to my problem.
I work in python and my scrip is tag :
#! /usr/bin/env python3
# coding: utf-8
in the header
Do you have an idea ?
Resolve. I use :
translatedTexte = codecs.escape_decode(translatedTexte)[0]
translatedTexte = translatedTexte.decode("utf8")
API of Google Translate gives you UTF-8 text.
You got c3 a9
(303 251 as octal numbers) which it is really é
, as expected.
So your code take the correct UTF-8 file and it writes it as maybe wrong encoding.
This line is just a myth, not useful:
# coding: utf-8
If you want that your code interpret input and output as UTF-8, you should explicitly say so. With your code, I assume that (one problem) is that you use print (better to write into a file). On Windows, by default, terminals are not UTF-8, but old "Windows ANSI like and extended also know as Windows 1252" encoding.
So write into a file (with explicit UTF-8 encoding), or just change terminal settings, to have UTF-8 terminal. In addition, you may have escape sequences, on results. To me, it smell much, to have results written in octal way. Not a think of standard Python (and it will complain, about wrong encoding). You may need to parse the response, to translate escape sequences.