Search code examples
htmlaemsightly

Default/correct context for HTML href attributes in Sightly


I'm using Sightly and while investigating a bug in my application I noticed a behaviour I didn't expect.

Some of the links would render with ampersands in the query string escaped twice. Example:

<a href="http://www.google.com?a=1&amp;amp;b=2&amp;amp;c=3">
    link with explicit attribute context
</a>

Upon closer inspection, it turned out we had an org.apache.sling.rewriter.Transformer implementation escaping special characters in all href attributes running in AEM.

Coupled with Sightly XSS protection, this resulted in double escapes.

While investigating this further, I disabled the transformer and noticed a strange behaviour in Sightly itself.

The attribute context and the default context in href attributes don't match

Given the following three elements, I'd expect them to render the href value in the same way (with the query string escaped, consistent with W3C standards)

<a href="${'http://www.google.com?a=1&b=2&c=3'}">no explicit context, expression used</a>
<a href="http://www.google.com?a=1&b=2&c=3">no explicit context</a>
<a href="${'http://www.google.com?a=1&b=2&c=3' @ context='attribute'}">
    explicit attribute context
</a>

However, only the last one performs the escaping and I get

<a href="http://www.google.com?a=1&b=2&c=3">no explicit context, expression used</a>
<a href="http://www.google.com?a=1&b=2&c=3">no explicit context</a>
<a href="http://www.google.com?a=1&amp;amp;b=2&amp;amp;c=3">
    explicit attribute context
</a>

For some reason, the the last one, using context='attribute' (the only one that does something with the & characters) escapes the ampersands twice, yielding invalid links.

This can be achieved with arbitrary element and attribute names so I think I can safely assume this is not some rewriter kicking in.

<stargate data-custom="${'http://www.google.com?a=1&b=2&c=3' @ context='attribute'}">
    attribute context in custom tag
</stargate>

Outputs:

<stargate data-custom="http://www.google.com?a=1&amp;amp;b=2&amp;amp;c=3">
    attribute context in custom tag
</stargate>

Furthermore, the Display Context Specification gave me the impression that the context, when rendering an attribute, would be picked up automatically as attribute

To protect against cross-site scripting (XSS) vulnerabilities, Sightly automatically recognises the context within which an output string is to be displayed within the final HTML output, and escapes that string appropriately.

Is the observed behaviour here to be expected or am I looking at a potential bug in Sightly?

Which context should I be using here? All contexts apart from attribute ignore the fact that query strings should be escaped in href. attribute on the other hand appears to be doing this twice. What's going on?

I'm using Adobe Granite Sightly Template Engine (compatibility)io.sightly.bundle 1.1.72

The uri context does not escape query strings in the way expected in HTML5 href attributes

I did also try using

<a href="${'http://www.google.com?a=1&b=2&c=3' @ context='uri'}">explicit uri context</a>

But it fails to escape the & chars, resulting in invalid HTML5.

<a href="http://www.google.com?a=1&b=2&c=3">explicit uri context</a>

Result of validation as HTML5:

Error Line 70, Column 35: & did not start a character reference. (& probably should have been escaped as &.)

<a href="http://www.google.com?a=1&b=2&c=3">explicit uri context</a>

The html context correctly renders links with multiple query parameters in href attributes

It seems the only context I could possibly use here at the moment is html (text escapes & twice, just like attribute)

<a href="${'http://www.google.com?a=1&b=2&c=3' @ context='html'}">explicit html context</a>

yields

<a href="http://www.google.com?a=1&amp;b=2&amp;c=3">explicit html context</a>

Changing to this context would allow me to get the right value in the href, as rendered by the browser. However, it doesn't seem to have the correct semantics.

To quote the description of the html context from the Sightly spec:

Use this in case you want to output HTML - Removes markup that may contain XSS risks


Solution

  • For src and href attributes Sightly uses the uri XSS escaping context 1, 2.

    Furthermore, the following markup is HTML5 valid using the validator from 3:

    <!DOCTYPE html>
    <html>
    <head>
        <title>Title</title>
    </head>
    <body>
        <a href="http://www.google.com?a=1&b=2&c=3">explicit uri context</a>
    </body>
    </html>
    

    Can you please point me to the spec regarding HTML 5 query strings escaping for HTML attributes?