This code grabs a keyword 'fun' from text files that I have and then prints the 20 characters before and after the keyword. However, I also want it to print the previous 2 lines and the next two lines, and I'm not sure how to do that. I wasn't sure if it is easier to change the code with this or just read the whole file at one time.
{my $inputfile = "file";
$searchword = 'fun';
open (INPUT, '<', $inputfile) or die "fatal error reading the file \n";
while ($line1=<INPUT>)
{
#read in a line of the file
if ($line1 =~m/$searchword/i)
{print "searchword found\n";
$keepline = $line1;
$goodline =1;
$keepline =~/(.{1,20})(fun)(.{1,20})/gi;
if ($goodline==1)
{&write_excel};
$goodline =0;
}
Your code as is seems to
$searchword
; $searchword
is found, $goodline
is unconditionally set to '1' and then tested to see if its '1' and finally reset to '0'Putting that aside, the question as to whether to read in the whole file depends on your circumstances some what - how big are the files you're going to be searching, does your machine have plenty of memory; is the machine a shared resource and so on. I'm going to presume you can read in the whole file as that's the more common position in my experience (those who disagree please keep in mind (a) I've acknowledge that its debatable; and (b) its very dependant on the circumstances that only the OP knows)
Given that, there are several ways to read in a whole file but the consensus seems to be to go with the module File::Slurp
. Given those parameters, the answer looks like this;
#!/usr/bin/env perl
use v5.12;
use File::Slurp;
my $searchword = 'fun';
my $inputfile = "file.txt";
my $contents = read_file($inputfile);
my $line = '\N*\n';
if ( $contents =~ /(
$line?
$line?
\N* $searchword \N* \n?
$line?
$line?
)/x) {
say "Found:\n" . $1 ;
}
else {
say "Not found."
}
File::Slurp
prints a reasonable error message if the file isn't present (or something else goes wrong), so I've left out the typical or die...
. Whenever working with regexes - particularly if your trying to match stuff on multiple lines, it pays to use "extended mode" (by putting an 'x' after the final '/') to allow insignificant whitespace in the regex. This allows a clearer layout.
I've also separated out the definition of a line for added clarity which consists of 0, 1 or more non-newlines characters, \N*
, followed by a new line, \n
. However, if your target is on the first, second, second-last or last line I presume you still want the information, so the requested preceding and following pairs of lines are optionally matched. $line?
Please note that regular expressions are pedantic and there are inevitably 'fine details' that effect the definition of a successful match vs an unwanted match - ie. Don't expect this to do exactly what you want in all circumstances. Expect that you'll have to experiment and tweek things a bit.