Search code examples
d

How to read a string character by character as a range in D?


How to read a line as a range in D?

I know there is ranges in D, but I just wondered how to simply iterate over each character of a string using this concept?

To show what I'm after, the similar code in Go is:

for _, someChar := range someString {
    // Do something
}

Solution

  • That would depend on whether you want to iterate over code units or code points. The language itself iterates over arrays by array elements, and strings are arrays of code units, so if you simply use foreach with type inference, then with

    foreach(c; "La Verité")
        writeln(c);
    

    the last two characters printed would be gibberish, because é is a code point made up of two UTF-8 code units, and you're printing out individual code units (since char is a UTF-8 code unit). Whereas, if you do

    foreach(dchar c; "La Verité")
        writeln(c);
    

    then the runtime will decode the code units to code points, and é will be printed as the last character. But none of this is really operating on strings as ranges. foreach operates on arrays natively without having to use the input range API. However, for all string types, the range API looks like

    @property bool empty();
    @property dchar front();
    void popFront();
    

    It operates on strings as ranges of dchar - not their code unit type. This avoids issues with functions like std.algorithm.filter operating on individual code units, since that would make no sense. Operating on code points isn't 100% correct either, since Unicode gets very complicated with regards to combining code points and graphemes and whatnot, but operating on code points is far closer to being correct (and I believe there's work being done on adding range support for graphemes into the standard library for the cases where you need that and are willing to pay the performance hit). So, having the range API for strings operate on them as ranges of dchar is far more correct, and if you did something like

    foreach(c; filter!"true"("La Verité"))
        writeln(c);
    

    you would be iterating over dchar, and é would print correctly. The downside to all of this of course is the fact that foreach on strings operates on the code unit level by default whereas the range API for strings operate on them as code points, so you have to be careful when mixing array operations and range-based operations on strings. That's also why string and wstring are not considered random-access ranges - just bidirectional ranges. You can't do random access in O(1) on code points when they're made up of varying numbers of code units (whereas dstring is a random-access range, because with UTF-32, every code unit is a code point).