Search code examples
winapiunicodemailto

How can I use unicode in "mailto" protocol?


I want to launch default e-mail client application via ShellExecute function.

I.e. I write something like this:

ShellExecute(0, 'mailto:[email protected]?subject=example&body=example', ...);

How can I encode non-US characters in subject and body?

I can't use default ANSI code page, because characters can be anything: chinese characters, cyrillic or something else.

P.S. Notes:

  1. I'm using ShellExecuteW function.
  2. Leaving subject and body "as is" will not work (tested with Windows Live Mail client on Win7 and Outlook Express on WinXP).
  3. Encoding subject as URLEncode(UTF8Encode(Subject)) will work for Windows Live Mail, but won't work for Outlook Express.
  4. URLEncode(UTF8Encode(Body)) will not work for both clients.

Solution

  • [email protected]?subject=example&body=%e5%85%ad

    The short answer is no. Characters must be percentage-encoded as defined by RFC 3986 and its predecessors. RFC 2368 defines the structure of the mailto URI.

    #include "windows.h"
    
    int main() {
      ShellExecute(0, TEXT("open"),
        TEXT("mailto:[email protected]?subject=example&body=%e5%85%ad"),
        TEXT(""), NULL, SW_SHOWNORMAL);
      
      return 0;
    }
    

    The body in this case is the CJK character U+516D (六) encoded as UTF-8 (E5 85 AD). This works correctly with Mozilla Thunderbird (you may need to install additional fonts if it does not).

    The rest is up to how your user-agent (mail client) interprets the URI. RFC 3986 mandates UTF-8, but prior specifications did not. A user-agent may fail to interpret the data correctly if it pre-dates RFC 3986, has not been updated or is maintaining backwards compatibility with prior implementations.

    Note: URLEncode functions generally mean the HTML application/x-www-form-urlencoded encoding. This will probably cause space characters to be replaced by plus characters.

    Note 2: I'm not current on the state of IRI support in the Windows shell, but it's probably worth looking into. However, some characters in the query part will still need to be percent-encoded.