Search code examples
regexperlgrep

How to convert BRE to Perl re


I have a bunch of basic regular expressions (BREs) in a text file, line by line, normally intended for grep option -f, that I'd like to load in a Perl script, too, to add some diagnostics. But since the regex syntax for grep and Perl are quite different, I can't just do qr{$_} or similar.

I really don't want to convert the file holding the BREs to PCRE syntax or similar to not add to the requirements of the rest of the tools, so converting the BRE to Perl regex syntax in the Perl script is my preferred and least disruptive option so far.

Unfortunately, I did not find a Perl module to achieve this and before I roll my own, I wanted to ask for suggestions here.


Solution

  • If using re::engine::GNU isn't possible, perl doesn't seem to have a module for directly converting BRE syntax regular expressions to perl-style REs, but pcre2 has support for it.

    A simple C program that reads BREs from standard input, one expression per line, and prints out the corresponding perl/pcre style ones (So easily integrated into your script, or convert ahead of time):

    // Compile with:
    // cc -o bre2perl -O -Wall -Wextra bre2perl.c $(pcre2-config --cflags --libs8)
    
    #define _POSIX_C_SOURCE 200809L
    
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    #define PCRE2_CODE_UNIT_WIDTH 8
    #include <pcre2.h>
    
    int main(void) {
      char *line = NULL;
      size_t linelength = 0;
      ssize_t len;
    
      while ((len = getline(&line, &linelength, stdin)) > 1) {
        if (line[len - 1] == '\n') { len -= 1; }
        PCRE2_UCHAR *perl_re = NULL;
        PCRE2_SIZE re_len = 0;
        int rc = pcre2_pattern_convert((PCRE2_SPTR)line, len,
                                       PCRE2_CONVERT_POSIX_BASIC,
                                       &perl_re, &re_len, NULL);
        if (rc == 0) {
          // There's a PCRE2-specific prefix to patterns that should be removed
          printf("%s\n", (const char *)perl_re
                  + (strncmp((const char *)perl_re, "(*NUL)", 6) == 0 ? 6 : 0));
          pcre2_converted_pattern_free(perl_re);
        } else {
          PCRE2_UCHAR errmsg[1024];
          pcre2_get_error_message(rc, errmsg, sizeof errmsg);
          fprintf(stderr, "error converting pattern '%s': %s\n",
                  line, (char *)errmsg);
          return EXIT_FAILURE;
        }
      }
    
      return 0;
    }
    

    Example usage:

    $ printf "%s\n" '^fo\{1,2\} bar$' | ./bre2perl
    ^fo{1,2} bar$