Comparing two arrays by first letters of each element and printing if equal

Let's say I have two arrays (A and B)

in directory A

#!/bin/bash
A=( $(ls *txt) )

directory A contains:

fox_abdce.txt
rabbit_abdce.txt
lemom_asnrndna.txt

in directory B

#!/bin/bash
B=( $(ls *txt) )

directory B contains:

fox_zzzzzz.txt
rabbit_zzzedd.txt
lemom_kokoijijim.txt

Or, input with type set (this could be generalized to anything similar)

#!/bin/bash
declare -a A=([0]="fox_abcde.txt" [1]="lemom_asnrndna.txt" [2]="rabbit_abcde.txt") 
declare -a B=([0]="fox_zzzzzz.txt" [1]="lemom_kokoijijim.txt" [2]="rabbit_zzzedd.txt")

I want to compare them to find out if all of them are similar by the first 3 letters

I would use AWK like this to find out if two columns from a csv file have the same initial three letters:

#!/bin/bash
export NUMBER_OF_DIGITS=3

matching

awk -F, '{if(substr($1, 1, $NUMBER_OF_DIGITS) == substr($2, 1, $NUMBER_OF_DIGITS)) print}' file.csv

Not matching

awk -F, '{if(substr($1, 1, $NUMBER_OF_DIGITS) != substr($2, 1, $NUMBER_OF_DIGITS)) print}' file.csv

How could I apply the same interrogation but using the arrays directly?

In this case the output should be anything with everything that matches

fox_abdce.txt
rabbit_abdce.txt
lemom_asnrndna.txt

fox_zzzzzz.txt
rabbit_zzzedd.txt
lemom_kokoijijim.txt

fox_abdce.txt             fox_zzzzzz.txt
rabbit_abdce.txt          rabbit_zzzedd.txt
lemom_asnrndna.txt        lemom_kokoijijim.txt

Solution

Assumptions:

file names do not include embedded linefeeds
both arrays have the same number of entries
we're to compare array entries that have the same array index

Adding a 'not matching' data point:

A=("fox_abdce.txt" "rabbit_abdce.txt" "ignore_me" "lemom_asnrndna.txt")
B=("fox_zzzzzz.txt" "rabbit_zzzedd.txt" "not_me" "lemom_kokoijijim.txt")

Fixing the NUMBER_OF_DIGITS issue:

#### replace this:

NUMBER_OF_DIGITS=(3)

#### with this:

NUMBER_OF_DIGITS=3

#### then feed to awk via a -v flag/arg, eg:

awk -v awk_var_name="OS_var_value"

One awk idea using process substitution:

echo "########## matching"

awk -v len="${NUMBER_OF_DIGITS}" '
FNR==NR                                  { a[FNR]=$0; next }
substr(a[FNR],1,len) == substr($0,1,len) { print a[FNR],$0 }
' <(printf "%s\n" "${A[@]}") <(printf "%s\n" "${B[@]}")

echo "########## not matching"

awk -v len="${NUMBER_OF_DIGITS}" '
FNR==NR                                  { a[FNR]=$0; next }
substr(a[FNR],1,len) != substr($0,1,len) { print a[FNR],$0 }
' <(printf "%s\n" "${A[@]}") <(printf "%s\n" "${B[@]}")

This generates:

########## matching
fox_abdce.txt fox_zzzzzz.txt
rabbit_abdce.txt rabbit_zzzedd.txt
lemom_asnrndna.txt lemom_kokoijijim.txt

########## not matching
ignore_me not_me

Assumptions:

file names do not include embedded commas (otherwise we will need to choose a different delimiter for the paste command)

A different approach using paste to join the two sets of process substitution:

$ paste -d, <(printf "%s\n" "${A[@]}") <(printf "%s\n" "${B[@]}")
fox_abdce.txt,fox_zzzzzz.txt
rabbit_abdce.txt,rabbit_zzzedd.txt
ignore_me,not_me
lemom_asnrndna.txt,lemom_kokoijijim.txt

Feeding the paste output to awk:

echo "########## matching"

awk -F, -v len="${NUMBER_OF_DIGITS}" '
substr($1,1,len) == substr($2,1,len)
' <(paste -d, <(printf "%s\n" "${A[@]}") <(printf "%s\n" "${B[@]}"))

echo "########## not matching"

awk -F, -v len="${NUMBER_OF_DIGITS}" '
substr($1,1,len) != substr($2,1,len)
' <(paste -d, <(printf "%s\n" "${A[@]}") <(printf "%s\n" "${B[@]}"))

This generates:

########## matching
fox_abdce.txt,fox_zzzzzz.txt
rabbit_abdce.txt,rabbit_zzzedd.txt
lemom_asnrndna.txt,lemom_kokoijijim.txt

########## not matching
ignore_me,not_me