I recently had a problem with html data URIs:
My source html included the character ā
, which rendered correctly when the html was loaded directly. However, when the html was converted to a data URI, the character instead rendered as Ä
.
After digging through the resulting URI, I found that the character had been encoded as %c4%81
, but this seems to be the correct URI encoding of ā
.
I even tried converting the data URI to base64, but I got the same issue. This seems to happen on both Chrome and Safari.
I'm wondering if it is a problem with encoding multi-byte unicode characters in data URIs, because ā
is currently the only multi-byte character in my html.
console.log(encodeURIComponent('ā'));
// https://stackoverflow.com/questions/23223718/failed-to-execute-btoa-on-window-the-string-to-be-encoded-contains-characte
console.log(btoa(unescape(encodeURIComponent('ā'))));
<iframe src="data:text/html,%c4%81"></iframe>
<iframe src="data:text/html;base64,xIE="></iframe>
You need to specify your character encoding when working with text data URIs, most commonly UTF-8.
If you simply add a ;charset=UTF-8
declaration to your mime type, the browser will decode the character correctly.
<iframe src="data:text/html;charset=UTF-8,%c4%81"></iframe>
<iframe src="data:text/html;charset=UTF-8;base64,xIE="></iframe>