I'm trying to add contacts to Sendgrid from a db which occasionally is storing the user email in punycode [email protected]
which translates to example-email@yahóo.com
in Unicode.
Anyway if I try and add the ascii version there's an error because sendgrid doesn't accept it - however it does accept the Unicode version.
So is there a way to convert them in python.
So I think long story short is there a way to decode punycode to Unicode?
Edit
As suggested in comments i tried
'example-email@yahóo.com'.encode('punycode').decode()
which returns [email protected]
so this is incorrect outside of python so is not a valid solution.
Thanks in advance.
There is the xn--
ACE prefix in your encoded e-mail address:
The ACE prefix for IDNA is "xn--" or any capitalization thereof.
So apply the idna
encoding (see Python Specific Encodings):
codec
idna
Implement RFC 3490, see alsoencodings.idna
. Only errors='strict' is supported.
Result:
'yahóo.com'.encode('idna').decode()
# 'xn--yaho-sqa.com'
and vice versa:
'xn--yaho-sqa.com'.encode().decode('idna')
# 'yahóo.com'
You could use the idna
library instead:
Support for the Internationalised Domain Names in Applications (IDNA) protocol as specified in RFC 5891. This is the latest version of the protocol and is sometimes referred to as “IDNA 2008”.
This library also provides support for Unicode Technical Standard 46, Unicode IDNA Compatibility Processing.
This acts as a suitable replacement for the “
encodings.idna
” module that comes with the Python standard library, but which only supports the older superseded IDNA specification (RFC 3490).