Search code examples
stringrakurakudonqpmoarvm

Strings and Strands in MoarVM


When running Raku code on Rakudo with the MoarVM backend, is there any way to print information about how a given Str is stored in memory from inside the running program? In particular, I am curious whether there's a way to see how many Strands currently make up the Str (whether via Raku introspection, NQP, or something that accesses the MoarVM level (does such a thing even exist at runtime?).

If there isn't any way to access this info at runtime, is there a way to get at it through output from one of Rakudo's command-line flags, such as --target, or --tracing? Or through a debugger?

Finally, does MoarVM manage the number of Strands in a given Str? I often hear (or say) that one of Raku's super powers is that is can index into Unicode strings in O(1) time, but I've been thinking about the pathological case, and it feels like it would be O(n). For example,

(^$n).map({~rand}).join

seems like it would create a Str with a length proportional to $n that consists of $n Strands – and, if I'm understanding the datastructure correctly, that means that into this Str would require checking the length of each Strand, for a time complexity of O(n). But I know that it's possible to flatten a Strand-ed Str; would MoarVM do something like that in this case? Or have I misunderstood something more basic?


Solution

  • When running Raku code on Rakudo with the MoarVM backend, is there any way to print information about how a given Str is stored in memory from inside the running program?

    My educated guess is yes, as described below for App::MoarVM modules. That said, my education came from a degree I started at the Unseen University, and a wizard had me expelled for guessing too much, so...

    In particular, I am curious whether there's a way to see how many Strands currently make up the Str (whether via Raku introspection, NQP, or something that accesses the MoarVM level (does such a thing even exist at runtime?).

    I'm 99.99% sure strands are purely an implementation detail of the backend, and there'll be no Raku or NQP access to that information without MoarVM specific tricks. That said, read on.

    If there isn't any way to access this info at runtime

    I can see there is access at runtime via MoarVM.

    is there a way to get at it through output from one of Rakudo's command-line flags, such as --target, or --tracing? Or through a debugger?

    I'm 99.99% sure there are multiple ways.

    For example, there's a bunch of strand debugging code in MoarVM's ops.c file starting with #define MVM_DEBUG_STRANDS ....

    Perhaps more interesting are what appears to be a veritable goldmine of sophisticated debugging and profiling features built into MoarVM. Plus what appear to be Rakudo specific modules that drive those features, presumably via Raku code. For a dozen or so articles discussing some aspects of those features, I suggest reading timotimo's blog. Browsing github I see ongoing commits related to MoarVM's debugging features for years and on into 2021.

    Finally, does MoarVM manage the number of Strands in a given Str?

    Yes. I can see that the string handling code (some links are below), which was written by samcv (extremely smart and careful) and, I believe, reviewed by jnthn, has logic limiting the number of strands.

    I often hear (or say) that one of Raku's super powers is that is can index into Unicode strings in O(1) time, but I've been thinking about the pathological case, and it feels like it would be O(n).

    Yes, if a backend that supported strands did not manage the number of strands.

    But for MoarVM I think the intent is to set an absolute upper bound with #define MVM_STRING_MAX_STRANDS 64 in MoarVM's MVMString.h file, and logic that checks against that (and other characteristics of strings; see this else if statement as an exemplar). But the logic is sufficiently complex, and my C chops sufficiently meagre, that I am nowhere near being able to express confidence in that, even if I can say that that appears to be the intent.

    For example, (^$n).map({~rand}).join seems like it would create a Str with a length proportional to $n that consists of $n Strands

    I'm 95% confident that the strings constructed by simple joins like that will be O(1).

    This is based on me thinking that a Raku/NQP level string join operation is handled by MVM_string_join, and my attempts to understand what that code does.

    But I know that it's possible to flatten a Strand-ed Str; would MoarVM do something like that in this case?

    If you read the code you will find it's doing very sophisticated handling.

    Or have I misunderstood something more basic?

    I'm pretty sure I will have misunderstood something basic so I sure ain't gonna comment on whether you have. :)