Search code examples
d

Why No [] operator overload for type Result is thrown with char[] array in D?


I was playing around with std.range and std.algorithm and bumped int the following issue.

int[] arr1 = [-1, 1, 2, 3, 5, 8];
char[] arr2 = ['a', 'b', 'c', 'd', 'e'];

auto res1 = arr.enumerate.find!(t => t[0] == 4);
auto res2 = arr1.enumerate.find!(t => t[0] == 4);

assert(typeof(res1).stringof == typeof(res2).stringof);

Now, I'd like to access the result of the find which is a [Tuple!(ulong, "index", int, "value")(4, 5), Tuple!(ulong, "index", int, "value")(5, 8)].

writeln(res1[0][1]); // 5

I correctly get 5. Now, if I do the same to res2 which is equal to [Tuple!(ulong, "index", dchar, "value")(4, 'e')]

writeln(res2[0][1]); // Error: no [] operator overload for type Result

an exception is thrown (scratching my head). Can you please explain why it works with int[] array and does not work with char[]?

UPDATE: If I call res2.array[0][1], it works but I would expect the error message to be more revealing.


Solution

  • The reason is that the Phobos library considers int[] to be random access, but char[] to be sequential access only and thus does not have the operator.

    OK, why is that? This is what the D community calls "autodecoding". A char[] is a UTF-8 string. Phobos, trying to be helpful, converts those UTF-8 sequences into a series of dchars, which represent Unicode code points.

    UTF-8 sequences have variable length. Most English text will have one byte corresponding to one character on screen, but this is not generally true with other languages. Accent marks, for example, may be represented by various two or three byte sequences. (and it gets even more complex in some cases, with various identical visual representations having different internal representations - std.uni.byGrapheme is a part of the Phobos library that is meant to help with this)

    Anyway, Phobos - again trying to be helpful even though we all basically universally feel this was a mistaken design now looking back - tries to condense those possibly-multibyte sequences down to one dchar at a time as it loops. Since it cannot know where the N'th dchar is without scanning through the whole string up to that point (since each dchar potentially may have different lengths and you must examine it to know the size), it cannot do this cheaply.

    Since the [] operator is supposed to be cheap, O(1) constant time (and constant memory) operation, this implementation is too complex to confirm to the interface and you get the error instead.

    The .array function just allocates a big buffer and does all the decoding work up-front, instead of on demand, and thus allows random access... but at the cost of both memory and processing time that may not be necessary if you only need to look at a little bit of the result.