Search code examples
rakurakudomoarvm

How are dynamically scoped variables implemented in Rakudo/MoarVM?


That is, variables like $*scalar, @*array and %*hash. I'm asking this question mainly because I want to have an idea of how much of a burden they are on the overall performance of the grammar/regex engine.

To add a bit of precision to my question, I would like to know what happens :

on lookup,

my $var-1 = $*scalar
my $var-2 = @*array[3]
my $var-3 = %*hash<key>

and on insertion,

$*scalar = "new value 1"
@*array[3] = "new value 2"
%*hash{"existing key"} = "new value 3"
%*hash{"new key"}      = "new value 3"

in relationship with the symbol table, the call stack, or any data structure involved like a special stack for handling dynamical scoping, or maybe a hash table of stacks.

My main interest is with array and hashes, and if they are both entirely duplicated each time we push/pop/modify/insert an array element or insert/modify/delete a key/value pair of a hash.

EDIT: My assumptions about dynamical scoping was wrong. I thought that, when modifying a dynamically scoped hash, the changes would be undone at the end of the lexical scope in which the change has been made, similar to what happens in Perl with the local keyword and its associated "temporaries stack", hence my "Is the data structure duplicated ?" question. But this is not what dynamically scoped means it seems.

My assumption was that this following Raku code was equivalent to the following Perl code. Instead, to replicate the behavior of the Perl code, we must do what the 2nd piece of Raku code do.

Raku 1:

my %*hash = (key => "value");

sub test() {
    say "test() start";
    say "\t", %*hash;
    %*hash<key>="new value";
    say "\t", %*hash;
    say "test() end";
}

say %*hash;
test();
say %*hash;

OUTPUT:

{key => value}
test() start
        {key => value}
        {key => new value}
test() end
{key => new value}

Perl

our %hash=(key => "value");

sub test {
    local %hash=%hash;
    say "test() start";
    say "\t", %hash;
    $hash{key}="new value";
    say "\t", %hash;
    say "test() end"
}

say %hash;
test();
say %hash;

OUTPUT:

keyvalue
test() start
        keyvalue
        keynew value
test() end
keyvalue

Raku 2:

my %*hash = (key => "value");

sub test() {
    my %*hash=CALLERS::<%*hash>;
    say "test() start";
    say "\t", %*hash;
    %*hash<key>="new value";
    say "\t", %*hash;
    say "test() end";
}

say %*hash;
test();
say %*hash;

OUTPUT:

{key => value}
test() start
        {key => value}
        {key => new value}
test() end
{key => value}

Solution

  • The costs of looking up a dynamically scoped variable are independent of the sigil. For the purposes of lookup, the sigil is just part of the variable name to locate. The compilation of @a[3] = $x and @*a[3] = $x are identical except for the lookup of the @a/@*a - that is, they both result in a call to postcircumfix:<[ ]>, passing the resolved variable, the index, and the value to assign, and that leads to the assignment.

    Arrays and hashes are implemented as mutable data structures; the only significant copying that would take place is if the dynamic array has to grow or the hash table needs its backing storage to grow to continue to provide efficient lookup.

    Putting aside the fallback to GLOBAL and PROCESS, dynamic variables are stored in lexical symbol tables (thus their declaration with my). In MoarVM, the term "frame" is used for a record on the callstack, conceptually [1] created at the entry to a sub or block and destroyed at its exit. The term "static frame" is used for a data structure that represents all the things that are invariant over invocations of a sub or block. (So, if one writes a recursive sub and call it, there are many frames, but only one static frame). So far as lexical storage goes, the static frame contains a hash table mapping lexical variable names (strings) into integers. The values of a lexical are stored in the frame, as an array with the indices mapped by the static frame's hash table.

    Most lexical variable lookups ($a) entirely bypass doing named lookups, being resolved at compile time to the applicable index.

    By contrast, dynamic variable lookups ($*a) do use the static frame's lookup table. Dynamic variable lookup proceeds by walking the callstack, doing a lookup in the static frame's hash of lexicals to see if such a variable is declared, and if so the index is used to resolve it in the frame. There are some generic mechanisms and one special-case mechanism to improve performance here.

    The generic ones are:

    • Strings have their hash codes cached, so the string $*a won't have to be repeatedly hashed; the integer hash code is just there and ready for the hash table lookup.
    • Strings are interned within a compilation unit, so if the declaration and usage of a dynamic variable occur within the same file, then the string comparison can short-circuit at pointer equality.

    The special case mechanism consists of a dynamic variable lookup cache that can be established on any frame. This stores a name together with a pointer to where the dynamic variable is stored on the callstack. This provides a "shortcut" that can be especially valuable for frequently accessed dynamic variables that are declared a huge number of frames down the callstack. See this code for more details on how the cache entries are installed. (They are invalidated inside of the sliced frames when a continuation is taken; they used to also be broadly invalidated during deoptimization, but that changed with the adoption of lazy deoptimization on stack unwind a year or so ago.)

    [1] Many Raku programs running on MoarVM enjoy a relatively high rate of inlining, which eliminates the cost of creation/destruction of a frame.