Search code examples
htmlurlobject-tagpercent-encodingcustom-data-attribute

Should data attribute of object tag be percent-encoded?


Suppose my web application renders the following tag:

<object type="application/x-pdf" data="http://example.com/test%2Ctest.pdf">
     <param name="showTableOfContents" value="true" />
     <param name="hideThumbnails" value="false" />
</object>

Should data attribute be escaped (percent-encoded path) or no? In my example it is. I haven't found any specification.

addendum

Actually, I'm interested in specification on what should browser plugins consuming data attribute expect to see there. For example, Adobe Acrobat plugin takes both escaped and unescaped uri. However, QWebPluginFactory treats data attribute as a human readable URI (unescaped), and that leads to double percent encoding. And I'm wondering whether it is a bug of QWebPluginFactory or not.


Solution

  • The data attribute expects the value to be a URI. So you should provide a value that is a syntactically valid URI.

    The current specification of URIs is RFC 3986. To see whether the , in the URI’s path needs to be encoded, take a look at how the path production rule is defined:

    path          = path-abempty    ; begins with "/" or is empty
                  / path-absolute   ; begins with "/" but not "//"
                  / path-noscheme   ; begins with a non-colon segment
                  / path-rootless   ; begins with a segment
                  / path-empty      ; zero characters
    

    Since we have a URI with authority information, we need to take a look at path-abempty (see URI production rule):

    path-abempty  = *( "/" segment )
    

    segment is zero or more pchar characters that is defined as follows (I’ve already expanded the production rules):

    pchar         = ALPHA / DIGIT / "-" / "." / "_" / "~" / "%" HEXDIG HEXDIG / "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" / ":" / "@"
    

    And as you can see, pchar expands to a literal ,. So you don’t need to encode the , in the path component. But since you are allowed to encode any non-delimiting character using the percent-encoding without changing its meaning, it is fine to use %2C instead of ,.