Search code examples
rubyregexpseudo-globals

Why are only a limited number of regular expression captures stored in `global_variables`?


If I do a match with a regular expression with ten captures:

/(o)(t)(th)(f)(fi)(s)(se)(e)(n)(t)/.match("otthffisseent")

then, for $10, I get:

$10 # => "t"

but it is missing from global_variables. I get (in an irb session):

[:$;, :$-F, :$@, :$!, :$SAFE, :$~, :$&, :$`, :$', :$+, :$=, :$KCODE, :$-K, :$,,
 :$/, :$-0, :$\, :$_, :$stdin, :$stdout, :$stderr, :$>, :$<, :$., :$FILENAME,
 :$-i, :$*, :$?, :$$, :$:, :$-I, :$LOAD_PATH, :$", :$LOADED_FEATURES,
 :$VERBOSE, :$-v, :$-w, :$-W, :$DEBUG, :$-d, :$0, :$PROGRAM_NAME, :$-p, :$-l,
 :$-a, :$binding, :$1, :$2, :$3, :$4, :$5, :$6, :$7, :$8, :$9]

Here, only the first nine are listed:

$1, :$2, :$3, :$4, :$5, :$6, :$7, :$8, :$9

This is also confirmed by:

global_variables.include?(:$10) # => false

Where is $10 stored, and why isn’t it stored in global_variables?


Solution

  • The numbered variables returned from Kernel#global_variables will always be the same, even before they are assigned. I.e. $1 through $9 will be returned even before you do the match, and matching more won't add to the list. (They can also not be assigned, e.g. using $10 = "foo".)

    Consider the source code for the method:

    VALUE
    rb_f_global_variables(void)
    {
        VALUE ary = rb_ary_new();
        char buf[2];
        int i;
    
        st_foreach_safe(rb_global_tbl, gvar_i, ary);
        buf[0] = '$';
    
        for (i = 1; i <= 9; ++i) {
            buf[1] = (char)(i + '0');
            rb_ary_push(ary, ID2SYM(rb_intern2(buf, 2)));
        }
    
        return ary;
    }
    

    You can (after getting used to looking at C) see from the for loop that the symbols $1 through $9 are hard coded into the return value of the method.

    So how then, can you still use $10, if the output of the global_variables doesn't change? Well, the output might be a bit misleading, because it would suggest your match data is stored in separate variables, but these are just shortcuts, delegating to the MatchData object stored in $~.

    Essentially $n looks at $~[n]. You'll find this MatchData object (coming from the global table) is part of the original output from the method, but it is not assigned until you do a match.

    As to what the justification for including $1 through $9 in the output of the function, you would need to ask someone on the Ruby core team. It might seem arbitrary, but there is likely some deliberation that went into the decision.