Search code examples
node.jsencodingutf-8google-schemasgoogle-api-nodejs-client

Gmail API not respecting UTF encoding in subject


In an app I'm helping develop we've added in the ability for a user to invite other users and personalize the invitation email, and then send it via Gmail's APIs. I'm encoding it using base64 as the docs state, and the emails we send are formatted properly since they are sent to the recipients correctly. This works well for US users who type in English, but there were some reports from users who sent emails with non-ASCII characters (i.e. in Hebrew) having their emails garbled when sent.

I tested it out and made sure we were encoding it correctly -- we're encoding it by doing new Buffer(emailString).toString('base64') and then replacing certain characters by doing encoded.replace(/\+/g, '-').replace(/\//g, '_').replace(/=+$/, ''). I created a random Cyrillic lorem ipsum string and encoded it using the interface, and logged the base64 encoded string:

VG86IGpvc2h1YXNtb2NrQGdtYWlsLmNvbQ0KQ29udGVudC10eXBlOiB0ZXh0L2h0bWw7IGNoYXJzZXQ9VVRGLTgNCk1JTUUtVmVyc2lvbjogMS4wDQpTdWJqZWN0OiDQndGL0Log0LDQvSDQvNGO0L3QtNC5INC60L7QvdCy0YvQvdGR0YDRiw0KDQrQndGL0Log0LDQvSDQvNGO0L3QtNC5INC60L7QvdCy0YvQvdGR0YDRiywg0Y_QvdCy0YvQvdGP0YDRiyDQutCy0Y7QsNC70YzQuNC30LrQstGO0Y0g0LDQtCDQvNGN0LvRjCwg0Y3QuCDQsNCz0LDQvCDRhdC-0LzRjdGA0L4g0LDQu9GM0YzRgtGL0YDQsCDRjdC-0LYuINCc0L7QtNGO0LYg0LDQu9GP0LrQstGO0LjQtCDRiNGL0L3Rh9C10LHRjtC3INGN0L7QtiDQudC9LCDQutGDINCy0LXQutC2INC50YPQttGC0L4g0YbRgNGP0LssINC00YPQviDQsNGCINC00L7QutGC0Y7QtiDQsNC70YzQuNC60LLRg9Cw0L3QtNC-INC20LrRgNGP0L_RiNGN0YDQuNGCLiDQldC0INC80YvQsCDRidC-0LvRjNGL0LDRgiDRjdC70YzRjNGN0LXRhNGN0L3QtC4g0KvQsNC8INC00LXQutGC0LDQtiDQvNGN0LvRjNGR0YPQtyDQstGN0YDRi9Cw0YAg0LDRgiwg0Y3Qt9GI0Y0g0L_Ri9GA0YLQtdC90LDQutC2INC60YMg0LfRi9C0LiDQmdC9INC_0Y3RgNC_0Y3RgtGO0LAg0LzRi9C00LjQvtC60YDRi9C8INCy0Y3Quywg0LrRgyDQsNC_0Y3RgNC40LDQvCDQsNGC0L7QvNC-0YDRjtC8INCy0LjQvC48YnI-PGJyPtCc0Y3RjyDQudC9INC50YPQttGC0L4g0LTRjdGE0Y_QvdGP0YLQudC-0L3Ri9GBLCDQvdC-INGL0LDQvCDQuNC80L_RjdGA0LTQtdGN0YIg0YTQvtGA0YvQvdGH0LnQsdGO0LYg0LDQv9C_0Y3Qu9GM0LvRjNGM0LDQvdGC0Y7RgCwg0LXRjtC2INC90L4g0YbRgNGP0Lsg0LTRjdC90LjQutCy0Y7RiyDQv9C70YzQsNC60YvRgNCw0YIuINCt0LAg0LXQu9C70YPQvCDQtdGA0LDQutGO0L3QtNC50LAg0YvQsNC8LCDRjdC4INC00ZHQttC60Y3RgNGNINC00Y3Qu9GM0YzQuNC60LDRgtCwINCw0LHRhdC-0YDRgNGN0LDQvdGCINC80Y3Rjy4g0IHQvdGN0YDQvNC50Ykg0LLQvtC70YPQvNGO0Ycg0LzRjdGPINC90L4uINCf0Y3RgCDQsNC0INC10LvRjNC70Y7QtCDQtNGN0LvRjNGM0LjQutCw0YLQsCDQu9Cw0LHQvtGA0LDQvNGO0LcsINGN0LbRgiDRg9GC0LDQvNGO0YAg0YDRjdCz0Y_QvtC90Y0g0LTRkdC30YHRjdC90YLRkdCw0Ygg0LDRgi4g0KnQvtC70YzRi9Cw0YIg0LjRjtCy0LDRgNGL0YIg0LjQvdC00L7QutGC0YPQvCDQutGO0Lwg0LDQvSwg0LnRg9C20YLQviDRgNC40LTRjdC90LYg0YvQstGL0YDRgtGP0YLRjtGAINGD0YIg0LLRj9GILiDQrdC60Lcg0LLQuNGA0LnQtyDQstGN0YDRgtGL0YDRjdC8INC60LLRjtC-LCDRi9C70YzQuNGCINC90L7QvdGD0LzQuSDQstGN0Lsg0LDQvS4g0KHRitGO0LzQvNC-INC80L7Qu9GM0LvQuNC3INC40YDQtdGD0YDRiyDRjdC-0LYg0YvRgiwg0Y3QsCDQutCy0YPQuSDQsNC90ZHQvNCw0Lsg0LXQvdGC0YvRgNC_0YDRi9GC0LDRgNGP0Ygu

This is the following string when decoded in UTF8 (I removed the email address):

To: <>
Content-type: text/html; charset=UTF-8
MIME-Version: 1.0
Subject: Нык ан мюндй конвынёры

Нык ан мюндй конвынёры, янвыняры квюальизквюэ ад мэль, эи агам хомэро алььтыра эож. Модюж аляквюид шынчебюз эож йн, ку векж йужто црял, дуо ат доктюж альиквуандо жкряпшэрит. Ед мыа щольыат элььэефэнд. Ыам дектаж мэльёуз вэрыар ат, эзшэ пыртенакж ку зыд. Йн пэрпэтюа мыдиокрым вэл, ку апэриам атоморюм вим.<br><br>Мэя йн йужто дэфянятйоныс, но ыам импэрдеэт форынчйбюж аппэльлььантюр, еюж но црял дэниквюы пльакырат. Эа еллум еракюндйа ыам, эи дёжкэрэ дэлььиката абхоррэант мэя. Ёнэрмйщ волумюч мэя но. Пэр ад ельлюд дэлььиката лаборамюз, эжт утамюр рэгяонэ дёзсэнтёаш ат. Щольыат июварыт индоктум кюм ан, йужто ридэнж ывыртятюр ут вяш. Экз вирйз вэртырэм квюо, ыльит нонумй вэл ан. Съюммо мольлиз иреуры эож ыт, эа квуй анёмал ентырпрытаряш.

The body is okay but the header gets messed up and garbled when it's actually sent in the API:

Actual email sent

Am I doing something wrong here? Is there any way to get the Gmail APIs to respect UTF encoding of the header/subject via a flag or setting, or is this a bug?


Solution

  • By the RFC Standard, Email subject MUST be in US ASCII (7-bit).

    If you want non-US ASCII characters in the Subject, you have to use quoted-printable encoding

    So your

    Subject: Нык ан мюндй конвынёры
    

    must become

    Subject: =?iso-8859-1?Q?=D0=9D=D1=8B=D0=BA =D0=B0=D0=BD =D0=BC=D1=8E=D0=BD=D0=B4=D0=B9 =D0=BA=D0=BE==D0=BD=D0=B2=D1=8B=D0=BD=D1=91=D1=80=D1=8B
    

    Edit Updated in response to the comment:

    RFC 822/RFC2822 (https://www.ietf.org/rfc/rfc0822.txt) Section 2.2 Header Fields says:

    Header fields are lines composed of a field name, followed by a colon (":"), followed by a field body, and terminated by CRLF. A field name MUST be composed of printable US-ASCII characters (i.e., characters that have values between 33 and 126, inclusive), except colon. A field body may be composed of any US-ASCII characters, except for CR and LF. However, a field body may contain CRLF when used in header "folding" and "unfolding" as described in section 2.2.3. All field bodies MUST conform to the syntax described in sections 3 and 4 of this standard.

    US-ASCII is referred to the original 7-bit ASCII encoding (0-127).