Search code examples
regexmarkdownrakugithub-flavored-markdown

Raku Regex to capture and modify the LFM code blocks


Update: Corrected code added below

I have a Leanpub flavored markdown* file named sample.md I'd like to convert its code blocks into Github flavored markdown style using Raku Regex

Here's a sample **ruby** code, which
prints the elements of an array:

{:lang="ruby"}
    ['Ian','Rich','Jon'].each {|x| puts x}

Here's a sample **shell** code, which
removes the ending commas and
finds all folders in the current path:

{:lang="shell"}
    sed s/,$//g
    find . -type d

In order to capture the lang value, e.g. ruby from the {:lang="ruby"} and convert it into

```ruby

I use this code

my @in="sample.md".IO.lines;
my @out;
for @in.kv -> $key,$val {
    if $val.starts-with("\{:lang") {
       if $val ~~ /^{:lang="([a-z]+)"}$/ { # capture lang
           @out[$key]="```$0"; # convert it into ```ruby
           $key++;
           while @in[$key].starts-with("    ") {
                 @out[$key]=@in[$key].trim-leading;
                 $key++;
           }
           @out[$key]="```";
       }
    }
    @out[$key]=$val;
}

The line containing the Regex gives Cannot modify an immutable Pair (lang => True) error.

I've just started out using Regexes. Instead of ([a-z]+) I've tried (\w) and it gave the Unrecognized backslash sequence: '\w' error, among other things.

How to correctly capture and modify the lang value using Regex?

  • the LFM format just estimated

Corrected code:

my @in="sample.md".IO.lines;
my \[email protected];
my @out;
my $k = 0;

while ($k < len) {
    if @in[$k] ~~ / ^ '{:lang="' (\w+) '"}' $ / { 
    push @out, "```$0";
    $k++;
    while @in[$k].starts-with("    ") {
        push @out, @in[$k].trim-leading;
        $k++;   }
    push @out, "```";
    }
    push @out, @in[$k];
    $k++;
}

for @out {print "$_\n"}

Solution

  • TL;DR

    • TL? Then read @jjemerelo's excellent answer which not only provides a one-line solution but much more in a compact form ;

    • DR? Aw, imo you're missing some good stuff in this answer that JJ (reasonably!) ignores. Though, again, JJ's is the bomb. Go read it first. :)

    Using a Perl regex

    There are many dialects of regex. The regex pattern you've used is a Perl regex but you haven't told Raku that. So it's interpreting your regex as a Raku regex, not a Perl regex. It's like feeding Python code to perl. So the error message is useless.


    One option is to switch to Perl regex handling. To do that, this code:

          /^{:lang="([a-z]+)"}$/
    

    needs m :P5 at the start:

    m :P5 /^{:lang="([a-z]+)"}$/
    

    The m is implicit when you use /.../ in a context where it is presumed you mean to immediately match, but because the :P5 "adverb" is being added to modify how Raku interprets the pattern in the regex, one has to also add the m.

    :P5 only supports a limited set of Perl's regex patterns. That said, it should be enough for the regex you've written in your question.

    Using a Raku regex

    If you want to use a Raku regex you have to learn the Raku regex language.

    The "spirit" of the Raku regex language is the same as Perl's, and some of the absolute basic syntax is the same as Perl's, but it's different enough that you should view it as yet another dialect of regex, just one that's generally "powered up" relative to Perl's regexes.

    To rewrite the regex in Raku format I think it would be:

    / ^ '{:lang="' (<[a..z]>+) '"}' $ /
    

    (Taking advantage of the fact whitespace in Raku regexes is ignored.)

    Other problems in your code

    After fixing the regex, one encounters other problems in your code.

    The first problem I encountered is that $key is read-only, so $key++ fails. One option is to make it writable, by writing -> $key is copy ..., which makes $key a read-write copy of the index passed by the .kv.

    But fixing that leads to another problem. And the code is so complex I've concluded I'd best not chase things further. I've addressed your immediate obstacle and hope that helps.