GREP Locating target letters in identical positions

I have a file with lines containing pairs of letter strings such as

ABXF\\CDYG

and a pair of target letters, for example X and Y (the target letters may vary). I would like to locate all the lines where the target letters are in the same position (in this example, both are in position 3 of their respective letter strings). The locations could be anywhere, include at the very first, or at the very last position. The two letter strings always have the same length.

How could I do such a search with regular expressions? (here the Perl grep).

Solution

If that's okay with you, here's a shellscript that might do the job.

#! /bin/sh

Target="${1:?missing target letters}"

File="${2:?missing input filename}"

Previous=''

test "${#Target}" -eq 2   ||   { echo 'please provide two target letters'; exit 1; }

test -r "$File"   ||   { echo "cannot find file \"$File\""; exit 1; }  
  
grep -n -b -o -e "${Target%?}\\|${Target#?}" "$File" \
  | while read -r Line
    do    if test "${Line%%:*}" != "${Previous%%:*}"
             then Previous="$Line"
          else
             printf '%s:%s\n' "$Previous" "$Line" \
               | { IFS=':' read -r Line Pos1 Char1 _ Pos2 Char2
                   test "$(( Pos1 == Pos2 - 6))" -eq 1 \
                     && test "$Char1" != "$Char2"      \
                     && echo "match at line $Line"
                 }
             Previous=''
          fi
    done

Based on the following input data:

ABXF\\CDYG
ZETX\\FCBA
XHCB\\YEIH
BYCT\\ABCD
CYTZ\\AXVH
ABXZ\\CDXV

when you invoke the script like this:

./scriptname XY INPUTFILE

it produces this output:

match at line 1
match at line 3
match at line 5

Explanation

The script uses the -o -b and -n grep options.

'-n' prints a line number for every match
'-b' includes a byte offset for every match
'-o' prints a matching result for every occurrences in a given line

Thus grep -n -b -o -e 'X\|Y' INPUTFILE produces :

1:2:X
1:8:Y
2:14:X
3:22:X
3:28:Y
4:34:Y

(line:offset:matched expression)

The script only parses that output, assuming that:

IF PreviousLine == CurrentLine
AND PreviousOffset + 6 == CurrentOffset
AND the matched letters are different
THEN there's a match

Tested under Debian 11 with GNU grep.

Hope that helps.