Search code examples
regexperltext-extraction

Perl: apply regular expressions in sequence


I need help in writing a Perl one-liner to

  1. find a string in a file and
  2. extract a floating point or an exponent number from that string.

For example, I have a text file called results.log:

...
TOL: 0.0244141
ort: 0.000282395
Q orthogonality: True

EPS: 0.000488281
err: 9.58692e-05
QR decomposition: True

Success: True
...

It contains results of a numerical experiment. I would like to find a line that starts with TOL: and extract the tolerance value 0.0244141. I can write a one-liner to find a line starting with TOL:

perl -ne '/TOL:/ && print' results.log
TOL: 0.0244141

I can find a line containing the floating point number 0.0244141

echo "TOL: 0.0244141" | perl -ne '/\d+.\d+/ && print'

Is there a way to "stack" two regular expressions together and apply them in sequence one after another to extract the numerical value itself? In other words, is it possible to apply a regular expression onto a result of a preceding regular expression?

To complete the task I would like to call this one-liner from a Perl script and store the extracted result into a variable:

my $tol = system( qq{ perl -ne '... && print' results.log } );

Solution

  • A nice and flexible solution is to read the values into a hash, then you can use values as you please.

    use strict;
    use warnings;
    
    my $log = "results.log";
    open my $fh, "<", $log or die "Cannot open $log: $!";
    my %log;     # declare variable to store values
    
    while (<$fh>) {   # while we can read a line from the file
        chomp;        # remove newline
        my ($key, $val) = split / *: */, $_, 2;   # split the line on :, also remove whitespace
        next unless defined $val;     # skip lines which do not contain values
        $log{$key} = $val;            # store the value in the appropriate key
    }
    
    print $log{TOL};    # <--- value is in $log{TOL}
    

    All of the values from the file are stored in %log. Of course, if you are just interested in the TOL value, you can just do

    my $tol;
    while (<$fh>) {
        if (/^TOL: (.+)/) {
            $tol = $1;
            last;              # skip to end
        }
    }
    

    The benefit compared to not using a shell call is that it is faster and easier to error control.