I need to check if one file is inside another file by bash script. For a given multiline pattern and input file.
Return value:
I want to receive status (how in grep command) 0 if any matches were found, 1 if no matches were found.
Pattern:
Explanation
Only the following examples should found matches:
pattern file1 file2 file3 file4
222 111 111 222 222
333 222 222 333 333
333 333 444
444
the following should't:
pattern file1 file2 file3 file4 file5 file6 file7
222 111 111 333 *222 111 111 222
333 *222 222 222 *333 222 222
333 333* 444 111 333
444 333 333
Here's my script:
#!/bin/bash
function writeToFile {
if [ -w "$1" ] ; then
echo "$2" >> "$1"
else
echo -e "$2" | sudo tee -a "$1" > /dev/null
fi
}
function writeOnceToFile {
pcregrep --color -M "$2" "$1"
#echo $?
if [ $? -eq 0 ]; then
echo This file contains text that was added previously
else
writeToFile "$1" "$2"
fi
}
file=file.txt
#1?1
#2?2
#3?3
#4?4
pattern=`cat pattern.txt`
#2?2
#3?3
writeOnceToFile "$file" "$pattern"
I can use grep command for all lines of pattern, but it fails with this example:
file.txt
#1?1
#2?2
#=== added line
#3?3
#4?4
pattern.txt
#2?2
#3?3
or even if you change lines: 2 with 3
file=file.txt
#1?1
#3?3
#2?2
#4?4
returning 0 when it should't.
How do I can fix it? Note that I prefer to use native installed programs (if this can be without pcregrep). Maybe sed or awk can resolve this problem?
I have a working version using perl.
I thought I had it working with GNU awk
, but I didn't. RS=empty string splits on blank lines. See the edit history for the broken awk version.
How can I search for a multiline pattern in a file? shows how to use pcregrep, but I can't see a way to get it to work when the pattern to search may contain regex special characters. -F
fixed-string mode doesn't usefully work with multi-line mode: it still treats the pattern as a set of lines to be matched separately. (Not as a multi-line fixed-string to be matched.) I see you were already using pcregrep in your attempt.
BTW, I think you have a bug in your code in the non-sudo case:
function writeToFile {
if [ -w "$1" ] ; then
"$2" >> "$1" # probably you mean echo "$2" >> "$1"
else
echo -e "$2" | sudo tee -a "$1" > /dev/null
fi
}
Anyway, attempts at using line-based tools have met with failure, so it's time to pull out a more serious programming language that doesn't force the newline convention on us. Just read both files into variables, and use a non-regex search:
#!/usr/bin/perl -w
# multi_line_match.pl pattern_file target_file
# exit(0) if a match is found, else exit(1)
#use IO::File;
use File::Slurp;
my $pat = read_file($ARGV[0]);
my $target = read_file($ARGV[1]);
if ((substr($target, 0, length($pat)) eq $pat) or index($target, "\n".$pat) >= 0) {
exit(0);
}
exit(1);
See What is the best way to slurp a file into a string in Perl? to avoid the dependency on File::Slurp
(which isn't part of the standard perl distro, or a default Ubuntu 15.04 system). I went for File::Slurp partly for readability of what the program is doing, for non-perl-geeks, compared to:
my $contents = do { local(@ARGV, $/) = $file; <> };
I was working on avoiding reading the full file into memory, with an idea from http://www.perlmonks.org/?node_id=98208. I think non-matching cases would usually still read the whole file at once. Also, the logic was pretty complex for handling a match at the front of the file, and I didn't want to spend a long time testing to make sure it was correct for all cases. Here's what I had before giving up:
#IO::File->input_record_separator($pat);
$/ = $pat; # pat must include a trailing newline if you want it to match one
my $fh = IO::File->new($ARGV[2], O_RDONLY)
or die 'Could not open file ', $ARGV[2], ": $!";
$tail = substr($fh->getline, -1); #fast forward to the first match
#print each occurence in the file
#print IO::File->input_record_separator while $fh->getline;
#FIXME: something clever here to handle the case where $pat matches at the beginning of the file.
do {
# fixme: need to check defined($fh->getline)
if (($tail eq '\n') or ($tail = substr($fh->getline, -1))) {
exit(0); # if there's a 2nd line
}
} while($tail);
exit(1);
$fh->close;
Another idea was to filter patterns and files to be searched through tr '\n' '\r'
or something, so they would all be single-lines. (\r
being a likely safe choice that wouldn't collide with anything already in a file or a pattern.)