Why has the resulting string literal of username:password
be encoded with Base64 in the Authorization header? What's the background of it?
This is the production rule for the userid-password tuple before it’s encoded:
userid-password = [ token ] ":" *TEXT
Here token is specified as follows:
token = 1*<any CHAR except CTLs or tspecials>
This is basically any US-ASCII character within the range of 32 to 126 but without some special characters ((
, )
, <
, >
, @
, ,
, ;
, :
, \
, "
, /
, [
, ]
, ?
, =
, {
, }
, space, and horizontal tab).
And TEXT is specified as follows:
TEXT = <any OCTET except CTLs,
but including LWS>
This is basically any octet (0–255) sequence except control characters (codepoints 0–31, 127) but including linear whitespace sequences, which is one or more space or horizontal tab characters that may be preceded by a CRLF sequence:
LWS = [CRLF] 1*( SP | HT )
Although this doesn’t break a header field value, LWS has the same semantics as a single space:
All linear whitespace, including folding, has the same semantics as SP.
And to keep such sequences as is, the string is encoded before it’s placed as field value.