Search code examples
stringcharacter-encodingdtext-to-html

Converting Text to HTML In D


I'm trying to figure the best way of encoding text (either 8-bit ubyte[] or string) to its HTML counterpart.

My proposal so far is to use a lookup-table to map the 8-bit characters

string[256] lutLatin1ToHTML;
lutLatin1ToXML[0x22] = "&quot";
lutLatin1ToXML[0x26] = "&amp";
...

in HTML that have special meaning using the function

pure string toHTML(in string src,
                   ref in string[256] lut) {
    return src.map!(a => (lut[a] ? lut[a] : new string(a))).reduce!((a, b) => a ~ b) ;
}

I almost work except for the fact that I don't know how to create a string from a `ubyte? (the no-translation case).

I tried

writeln(new string('a'));

but it prints garbage and I don't know why.

For more details on HTML encoding see https://en.wikipedia.org/wiki/Character_entity_reference


Solution

  • You can make a string from a ubyte most easily by doing "" ~ b, for example:

    ubyte b = 65;
    string a = "" ~ b;
    writeln(a); // prints A
    

    BTW, if you want to do a lot of html stuff, my dom.d and characterencodings.d might be useful: https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff

    It has a html parser, dom manipulation functions similar to javascript (e.g. ele.querySelector(), getElementById, ele.innerHTML, ele.innerText, etc.), conversion from a few different character encodings, including latin1, and outputs ascii safe html with all special and unicode characters properly encoded.

    assert(htmlEntitiesEncode("foo < bar") == "foo &lt; bar";
    

    stuff like that.