Search code examples

Perl: assignment within scalar and string matching (regex)

I understand the general aim of the following piece of code (i.e. sum up the numeric part of the string, e.g. for currstr="3S47M" then seqlength=50).

But could someone explain me what is happening line by line ?

In particular, I have issue to understand what value where is holding at each turn. More precisely, I don't understand the part with the scalar function ("scalar($RLENGTH = length($&), $RSTART = length($`)+1)") ?

Is it correct that the assignment of RLENGTHand RSTARTtake place inside scalar ?

Why using comma-separated assignment within scalar ? What does it mean ? And what is then the result of its evaluation ?

If anybody could help, I will be very very grateful !



  my $seqlength=0; 
  my $currstr="3S47M";

  my $where = $currstr =~ /[0-9]+[M|D|N|X|=|S|H|N]/
    ? scalar($RLENGTH = length($&), $RSTART = length($`)+1) : 0;
  while ($where > 0) {
    $seqlength += substr($currstr, ($where)-1, $RLENGTH - 1) + 0;
    $currstr = substr($currstr, ($where + $RLENGTH)-1);
    $where = $currstr =~ /[0-9]+[M|D|N|X|=|S|H|N]/
      ? scalar($RLENGTH = length($&), $RSTART = length($`)+1) : 0;

edit: what is the purpose of RSTART ? why writing scalar($RLENGTH = length($&)will not work ?


  • $where = $currstr =~ /[0-9]+[M|D|N|X|=|S|H|N]/
      ? scalar($RLENGTH = length($&), $RSTART = length($`)+1) : 0;

    is equivalent to

    if ($currstr =~ /[0-9]+[M|D|N|X|=|S|H|N]/) {
       $where = scalar($RLENGTH = length($&), $RSTART = length($`)+1);
    } else {
       $where =  0;

    scalar is useless here. The expressions is already in scalar context. Simple parens would do.

    When EXPRX, EXPRY is evaluated in scalar context, both EXPRX and EXPRY are evaluated in turn, and it results in the result of EXPRY. As such, the above is equivalent to

    if ($currstr =~ /[0-9]+[M|D|N|X|=|S|H|N]/) {
       $RLENGTH = length($&);
       $RSTART = length($`) + 1;
       $where = $RSTART;
    } else {
       $where =  0;

    Note that [M|D|N|X|=|S|H|N] is a weird way of writing [MDX=SHN|]. The duplicate N and | are ignored. In fact, | is probably not supposed to be there at all. I suspect it's supposed to be [DHMNSX=].

    If I understand correctly, the code could have been written as follows:

    my $currstr = "3S47M";
    my $seqlength = 0; 
    while ($currstr =~ /([0-9]+)[DHMNSX=]/g) {
       $seqlength += $1;

    The following might even be sufficient (though not equivalent):

    my $currstr = "3S47M";
    my $seqlength = 0; 
    while ($currstr =~ /[0-9]+/g) {
       $seqlength += $&;