Search code examples
gmailbase64data-uri

Animated icon in email subject


I know about Data URIs in which base64 encoded data can be used inline such as images. Today I received an email actually an spam one in which there was an animated (gif) icon in its subject:

enter image description here

Here is the icon alone:

enter image description here

So the only thing did cross my mind was all about Data URIs and if Gmail allows some sort of emoticons to be inserted in subject. I saw the full detailed version of email and pointed to subject line at the below picture:

enter image description here

So GIF comes from =?UTF-8?B?876Urg==?= encoded string which is similar to Data URI scheme however I couldn't get the icon out of it. Here is element HTML source:

enter image description here

Long story short, there are lots of emoticons from https://mail.google.com/mail/e/XXX where XXX are hexadecimal numbers. They are documented nowhere or I couldn't find it. If that's about Data URI, so how is it possible to include them in Gmail's email subject? (I forwarded that email to a yahoo email account, seeing [?] instead of icon) and if it's not, then how that encoded string is parsed?


Solution

  • Short description:

    They are referred to internally as goomoji, and they appear to be a non-standard UTF-8 extension. When Gmail encounters one of these characters, it is replaced by the corresponding icon. I wasn't able to find any documentation on them, but I was able to reverse engineer the format.

    What are these icons?

    Those icons are actually the icons that appear under the "Insert emoticons" panel.

    Gmail Insert Emoticons

    While I don't see the 52E icon in the list, there are several others that follow the same convention.

    Note that there are also some icons whose names are prefixed, such as gtalk.03C gtalk.03C. I was not able to determine if or how these icons can be used in this manner.

    What is this Data URI thing?

    It's not actually a Data URI, though it does share some similarities. It's actually a special syntax for encoding non-ASCII characters in email subjects, defined in RFC 2047. Basically, it works like this.

    =?charset?encoding?data?=
    

    So, in our example string, we have the following data.

    =?UTF-8?B?876Urg==?=
    
    • charset = UTF-8
    • encoding = B (means base64)
    • data = 876Urg==

    So, how does it work?

    We know that somehow, 876Urg== means the icon 52E, but how?

    If we base64 decode 876Urg==, we get 0xf3be94ae. This looks like the following in binary:

    11110011 10111110 10010100 10101110
    

    These bits are consistent with a 4-byte UTF-8 encoded character.

    11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
    

    So the relevant bits are the following.:

         011   111110   010100   101110
    

    Or when aligned:

    00001111 11100101 00101110
    

    In hexadecimal, these bytes are the following:

    FE52E
    

    As you can see, except for the FE prefix which is presumably to distinguished the goomoji icons from other UTF-8 characters, it matches the 52E in the icon URL. Some testing proves that this holds true for other icons.

    Sounds like a lot of work, is there a converter?:

    This can of course be scripted. I created the following Python code for my testing. These functions can convert the base64 encoded string to and from the short hex string found in the URL. Note, this code is written for Python 3, and is not Python 2 compatible.

    Conversion functions:

    import base64
    
    def goomoji_decode(code):
        #Base64 decode.
        binary = base64.b64decode(code)
        #UTF-8 decode.
        decoded = binary.decode('utf8')
        #Get the UTF-8 value.
        value = ord(decoded)
        #Hex encode, trim the 'FE' prefix, and uppercase.
        return format(value, 'x')[2:].upper()
    
    def goomoji_encode(code):
        #Add the 'FE' prefix and decode.
        value = int('FE' + code, 16)
        #Convert to UTF-8 character.
        encoded = chr(value)
        #Encode UTF-8 to binary.
        binary = bytearray(encoded, 'utf8')
        #Base64 encode return end return a UTF-8 string. 
        return base64.b64encode(binary).decode('utf-8')
    

    Examples:

    print(goomoji_decode('876Urg=='))
    print(goomoji_encode('52E'))
    

    Output:

    52E
    876Urg==
    

    And, of course, finding an icon's URL simply requires creating a new draft in Gmail, inserting the icon you want, and using your browser's DOM inspector.

    DOM Inspector