Search code examples
coldfusioncoldfusion-10coldfusion-11

Encoding E-Mail Addresses: EncodeForHTML or EncodeForURL


When a user registers on a site, should we use EncodeForHTML() or EncodeForURL() before storing the value in a DB?

The reason I ask this is that when I send an e-mail to someone that includes a URL that contains an email address as a URL variable, I have to use EncodeForURL(). But if this email address is already encoded using EncodeForHTML(), it will mean I have to Canonicalize() it before using EncodeForURL() on it again.

I would therefore think that EncodeForURL() is probably good, but is it 'safe' and 'correct' when storing the value in a database?

Update: Upon reading the docs it says that EncodeForURL is only for using a value in a URL. Thereofore it seems to make sense that I should store it as EncodedForHTML, but then Canonicalize and re-encode for URL when using it in a URL context. I don't know how much of a performance hit all this encoding is going to take on my server...??


Solution

  • Copying this from my company's internal documentation. Not sure if the images uploaded correctly since imagr is blocked @ work. If so, I'll re-upload them later. I'll be publishing this and more related content to a Githib repo in the future.


    You should store it as simple text, but make sure you scrub your data on the way in using an AntiSamy library. Once the data is safe, make sure to encode the data on the way out using the proper encoder. And FYI, there's a big difference between the output of encodeForHTML() and encodeForHTMLAttribute().

    In the below examples, substitute the variables that define email addresses with data from the DB.


    PROTIP: Don't use these encoders in CFFORM tags. Those tags take care of the encoding for you. CF 9 and below use HTMLEditFormat(), CF 10 and above most likely use encodeForHTMLAttribute().


    Simple Implementation

    A basic implementation is to include a single e-mail address in order to populate the "To" field of a new e-mail window.

    CFML

    <cfset email = "someone@example.com" />
    <a href="mailto:#email#">E-mail</a>

    HTML Output

    <a href="mailto:someone@example.com">E-mail</a>
    

    CFML with Proper Encoding

    <cfset email = "someone@example.com" />
    <a href="mailto:#encodeForURL(email)#">E-mail</a>

    Encoded HTML Output

    Notice that the "@" symbol is properly percent encoded as "%40".

    <a href="mailto:someone%40example.com">E-mail</a>
    

    Results when clicked

    Simple Implementation Results when clicked.

    And if you plan on showing the e-mail address on the page as part of the link:

    <cfset email = "someone@example.com" />
    <a href="mailto:#encodeForURL(email)#">#encodeForHTML(email)#</a>

    Attack Vector

    An advanced implementation includes e-mail addresses for "To" & "CC". It can also pre-populate the body and subject of the new e-mail.

    CFML without encoding

    <cfset email = "someone@example.com" />
    <cfset email_cc = "someone_else@example.com" />
    <cfset subject = "This is the subject" />
    <cfset body = "This is the body" />
    <a href="mailto:#email#?cc=#email_cc#&subject=#subject#&body=#body#">E-mail</a>

    HTML Output

    <a href="mailto:someone@example.com?cc=someone_else@example.com&subject=This is the subject&body=This is the body">E-mail</a>

    Results when clicked

    enter image description here

    Notice that the subject and body parameters contain spaces. While this string will technically work, it is still prone to attack vectors.

    Imagine the value of body is set by the result of a database query. This record has been "infected" by a malicious user and the default body message has an appended "BCC" address, so some evil user can get copies of e-mails sent via this link.

    Infected Data

    <cfset body = "This is the body&bcc=someone@evil.com" />

    HTML Output

    <a href="mailto:someone@example.com?cc=someone_else@example.com&subject=This is the subject&body=This is the body&bcc=someone@evil.com">E-mail</a>

    Results when clicked

    enter image description here

    In order to stop this MAILTO link from being infected, this string needs to be properly encoded.

    CFML with HTML Attribute Encoding

    Since "href" is an attribute of the <a> tag, you might think to use the HTML Attribute encoder. This would be incorrect.

    <cfset email = "someone@example.com" />
    <cfset email_cc = "someone_else@example.com" />
    <cfset subject = "This is the subject" />
    <cfset body = "This is the body&bcc=someone@evil.com" />
    <a href="mailto:#encodeForHTMLAttribute(email)#?cc=#encodeForHTMLAttribute(email_cc)#&subject=#encodeForHTMLAttribute(subject)#&body=#encodeForHTMLAttribute(body)#">E-mail</a>

    HTML Output

    <a href="mailto:someone&#x40;example.com?cc=someone_else&#x40;example.com&subject=This&#x20;is&#x20;the&#x20;subject&body=This&#x20;is&#x20;the&#x20;body&amp;bcc&#x3d;someone&#x40;evil.com">E-mail</a>

    Results when clicked

    enter image description here

    CFML with URL Encoding

    The correct encoding of a MAILTO link is done with the URL encoder.

    <cfset email = "someone@example.com" />
    <cfset email_cc = "someone_else@example.com" />
    <cfset subject = "This is the subject" />
    <cfset body = "This is the body&bcc=someone@evil.com" />
    <a href="mailto:#encodeForURL(email)#?cc=#encodeForURL(email_cc)#&subject=#encodeForURL(subject)#&body=#encodeForURL(body)#">E-mail</a>

    HTML Output with Correct Encoding

    Notice these things about the URL encoder:

    1. Each space (" ") is converted to a plus sign ("+") instead of its expected percent value ("%20").
    2. Encoding is otherwise done using percent ("%") values.
    3. Since the individual query paramters are encoded, the ampersands ("&") connecting each paramter were not encoded.
    4. When the "body" paramter is encoded, it includes the "&body=" string that was maliciously injected. This entire string is now part of the message body, which prevents the unintended "bcc" of the e-mail.

    <a href="mailto:someone%40example.com?cc=someone_else%40example.com&subject=This+is+the+subject&body=This+is+the+body%26bcc%3Dsomeone%40evil.com">E-mail</a>

    Results when clicked

    enter image description here

    What's with the plus signs? It is up to the individual mail client (e.g. Outlook, GMail, etc.) to correctly decode these URL encoded values.