Search code examples
cregexperlintrospectionxs

How to introspect regexes in the Perl API


I'm working on some code that needs to serialize Perl regexes, including any regex flags. Only a subset of flags are supported, so I need to detect when unsupported flags like /u are in the regex object.

The current version of the code does this:

static void serialize_regex_flags(buffer *buf, SV *sv) {
  char flags[] = {0,0,0,0,0,0};
  unsigned int i = 0, f = 0;
  STRLEN string_length;
  char *string = SvPV(sv, string_length);

Then manually processes string char-by-char to find flags.

The problem here is that the stringification of regex flags changed (I think in Perl 5.14) from e.g. (?i-xsm:foo) to (?^i:foo), which makes parsing a pain.

I could check the version of perl, or just write the parser to handle both cases, but something tells me there must be a superior method of introspection available.


Solution

  • In Perl, you'd use re::regexp_pattern.

     my $re = qr/foo/i;
     my ($pat, $mods) = re::regexp_pattern($re);
     say $pat;   # foo
     say $mods;  # i
    

    As you can see from the source of regexp_pattern, there's no function in the API to obtain that information, so I recommend that you call that function too from XS too.

    perlcall covers calling Perl functions from C. I came up with the following untested code:

    /* Calls re::regexp_pattern to extract the pattern
     * and flags from a compiled regex.
     *
     * When re isn't a compiled regex, returns false,
     * and *pat_ptr and *flags_ptr are set to NULL.
     *
     * The caller must free() *pat_ptr and *flags_ptr.
     */
    

    static int regexp_pattern(char ** pat_ptr, char ** flags_ptr, SV * re) {
       dSP;
       int count;
       ENTER;
       SAVETMPS;
       PUSHMARK(SP);
       XPUSHs(re);
       PUTBACK;
       count = call_pv("re::regexp_pattern", G_ARRAY);
       SPAGAIN;
    
       if (count == 2) {
          /* Pop last one first. */
          SV * flags_sv = POPs;
          SV * pat_sv   = POPs;
    
          /* XXX Assumes no NUL in pattern */
          char * pat   = SvPVutf8_nolen(pat_sv); 
          char * flags = SvPVutf8_nolen(flags_sv);
    
          *pat_ptr   = strdup(pat);
          *flags_ptr = strdup(flags);
       } else {
          *pat_ptr   = NULL;
          *flags_ptr = NULL;
       }
    
       PUTBACK;
       FREETMPS;
       LEAVE;
    
       return *pat_ptr != NULL;
    }
    

    Usage:

    SV * re = ...;
    
    char * pat;
    char * flags;
    regexp_pattern(&pat, &flags, re);