Search code examples
utf-8http-headersw3crfcemail-headers

Content-Type with charset only


I came across this interesting header:

Content-Type: charset=utf-8

Set HTTP header to UTF-8 using PHP

The answerer says that this syntax is defined by RFC 2616, but I am not seeing it in the provided link. Is this valid syntax, and if so where specifically is this defined?


Solution

  • The production in RFC 2616 for the Content-Type header is this:

    Content-Type   = "Content-Type" ":" media-type
    

    And the media-type production is this:

    media-type     = type "/" subtype *( ";" parameter )
    type           = token
    subtype        = token
    

    That says that while the parameter part (e.g., charset=utf-8 is optional, the type "/" subtype part is not—that is, a media type must have type followed by a slash followed by a subtype.

    So Content-Type: charset=utf-8 isn’t valid syntax per that, and not specially defined anywhere else normatively/authoritatively to be either.

    RFC 2616 is actually obsoleted by RFC 7231 and several other RFCs (the current HTTP RFCs).

    But the corresponding parts of RFC 7231 define essentially the same productions for this case:

    The production in RFC 7231 for the value of the Content-Type header is this:

    Content-Type = media-type
    

    And the media-type production is this:

    media-type = type "/" subtype *( OWS ";" OWS parameter )
    type       = token
    subtype    = token
    

    And no other spec obsoletes or supersedes that part—RFC 7231 remains authoritative on this.


    Most programming languages have good media-type parsing libs for syntax checking; example:

    npm install content-type
    node -e "var ct = require('content-type'); ct.parse('charset=utf-8')"
    => TypeError: invalid media type
    node -e "var ct = require('content-type'); ct.parse('image; charset=utf-8')"
    => TypeError: invalid media type