Search code examples
regexperl

Perl Regular expression to read square bracket


I would like read bit inside square bracket and also want the square bracket. The tricky part is class4. sample1[1] is not a bit. Bit only at the end of line.

Example:

File1.txt
class1->Signal = sample1_sample2.sample3_sample4[4:4];
class2->Signal = sample1.sample2.sample3_sample4_sample5[2];
class3->Signal = sample1+sample2_sample3.sample4.sample5sample7[7:3];
class4->Signal = sample1[1]+sample2_sample3.sample4.sample5sample7[7:3];

Expectation result:

class1 bit = [1:2]
class2 bit = [2]
class3 bit = [7:3]
class4 bit = [7:3]

I use regular expression, but square bracket cannot be read. [] = Used for set of characters. ... = Any character except newline. ref: https://www.geeksforgeeks.org/perl-regex-cheat-sheet/

My CODE:

my $file = "$File1.txt";
my $line;

open (FILE,"<", $file) or die "Cannot open a file: $!";
while (<FILE>){
    my $line = $_;
    if ($line =~ m/[..]/){
        $line = $&;
    }
}
close (FILE);

Result only show:.........

I hope you guys can help me by giving idea. Thanks.


Solution

  • [..] makes a character literal for matching the characters within the brackets, period in this case.

    Since you are only matching literal periods, this is all you see.

    This problem can be solved with a fairly simple regex.

    Since you only want the last bracket, you can rely on the greadiness of .* to skip any brackets in the middle:

    use strict;
    use warnings;
    
    my $file = "File1.txt"; 
    my $line;
    
    open (FILE, "<", $file) or die "Cannot open a file: $!";
    while (<FILE>){
        $line = $_;
        if( $line =~ /(class\d).*(\[[^\]]*\]);/ ){
            $line = "$1 bit = $2";
        }
    }
    close (FILE);
    

    the regex /(class\d).*(\[[^\]]*\]);/ will match class followed by a digit, then the .* matches the rest of the line (hence it's greedy) and gives back enough to match (\[[^\]]*\]);

    Using ^ as the first character in a character literal makes it match anything EXCEPT the characters within. To match literal [ you have to escape it like \[.

    (              # capture to $1 
        class\d    # match "class" followed by a digit
    )              # end capture
    .*             # match anything (greedy)
    (              # capture to $2
        \[         # literal [
        [^ \] ]*   # match anything, except ] (greedy)
        \]         # literal ]
    )              # end capture
    ;              # match ;
    

    The parentheses will save what is matched within to the variables $1, $2, ... etc.

    This can also be done with substitute, using the same regex and the /r flag to return the value:

    while (<FILE>){
        $line = s/(class\d).*(\[[^\]]*\]);/$1 bit = $2/r;
    }
    

    Here's a simple command line one-liner that'll do the same:

    perl -wlp -e 's/(class\d).*(\[[^\]]*\]);/$1 bit = $2/' File1.txt
    

    change ' to " to run on windows