I have a file with lines containing pairs of letter strings such as
ABXF\\CDYG
and a pair of target letters, for example X and Y (the target letters may vary). I would like to locate all the lines where the target letters are in the same position (in this example, both are in position 3 of their respective letter strings). The locations could be anywhere, include at the very first, or at the very last position. The two letter strings always have the same length.
How could I do such a search with regular expressions? (here the Perl grep).
If that's okay with you, here's a shellscript that might do the job.
#! /bin/sh
Target="${1:?missing target letters}"
File="${2:?missing input filename}"
Previous=''
test "${#Target}" -eq 2 || { echo 'please provide two target letters'; exit 1; }
test -r "$File" || { echo "cannot find file \"$File\""; exit 1; }
grep -n -b -o -e "${Target%?}\\|${Target#?}" "$File" \
| while read -r Line
do if test "${Line%%:*}" != "${Previous%%:*}"
then Previous="$Line"
else
printf '%s:%s\n' "$Previous" "$Line" \
| { IFS=':' read -r Line Pos1 Char1 _ Pos2 Char2
test "$(( Pos1 == Pos2 - 6))" -eq 1 \
&& test "$Char1" != "$Char2" \
&& echo "match at line $Line"
}
Previous=''
fi
done
Based on the following input data:
ABXF\\CDYG
ZETX\\FCBA
XHCB\\YEIH
BYCT\\ABCD
CYTZ\\AXVH
ABXZ\\CDXV
when you invoke the script like this:
./scriptname XY INPUTFILE
it produces this output:
match at line 1
match at line 3
match at line 5
The script uses the -o
-b
and -n
grep options.
Thus grep -n -b -o -e 'X\|Y' INPUTFILE
produces :
1:2:X
1:8:Y
2:14:X
3:22:X
3:28:Y
4:34:Y
(line:offset:matched expression)
The script only parses that output, assuming that:
Tested under Debian 11 with GNU grep.
Hope that helps.