Search code examples
regexperl

Special handling of Perl range operator


I am parsing very huge file which is having content similar to:

File input_text.txt:

hello abc
0] Framework table
f1
f2
f3
0] number of entries
randomtext1
0] Test table
t1
t2
t3
0] number of entries
randomtext2
1] Test table
1] Same as framework page table
randomtext3
2] Test table
t4
t5
t6
2] number of entries
randomtext4
3] Test table
t4
t5
t6
3] number of entries
4] Test table
4] Same as framework page table
randomtext5
1] Framework table
f4
f5
f6
1] number of entries
randomtext6
foofoobar

From this file, I want to extract table entries as follows, which should be expected output:

Here is your framework table:
f1
f2
f3
f4
f5
f6

Here is your test table:
t1
t2
t3
1] Same as framework page table
t4
t5
t6
t4
t5
t6
4] Same as framework page table

I cannot read the whole file in an array due to its large size and number of entries. I have written following code using range operator, but it is showing unexpected result:

    $input_log_file = "input_text.txt";
    open(LOG_FILE, "$input_log_file") or die("Can't open $input_log_file to read. \n");
    while (<LOG_FILE>)
    {
        if (/Framework table/ .. /number of entries/)
        {
            next if (/Framework table/ || /number of entries/);
            push @framework, $_;
        }
        if (/Test table/ .. /Same as framework page table/)
        {
           next if (/Test table/);
           push @test, $_;
        }
        # if(/Same as framework page table/)
        # {
        #   next;
        # }
        if (/Test table/ .. /number of entries/)
        {
           next if (/Test table/ || /number of entries/);
           push @test, $_;
        }
       
    }
    
    close(LOG_FILE);

    print "\nHere is your framework table:\n";
    print @framework;
    print "\nHere is your test table:\n";
    print @test;

I am not able to understand, how to 'break' the range operator to parse the file successfully.

Any help please?


Solution

  • Don't test for two different ranges with the same starting expression. Merge them into one.

        while (<LOG_FILE>) {
            if (/Framework table/ .. /number of entries/)
            {
                next if /Framework table|number of entries/;
                push @framework, $_;
            }
            if (/Test table/ .. /Same as framework page table|number of entries/)
            {
               next if /Test table|number of entries/;
               push @test, $_;
           }
        }
    

    Also, instead of /Regex1/ || /Regex2/, use the regex alternative /Regex1|Regex2/.