Search code examples
arraysbashdiffcompare

Compare/Difference of two arrays in Bash


Is it possible to take the difference of two arrays in Bash. What is a good way to do it?

Code:

Array1=( "key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10" )
Array2=( "key1" "key2" "key3" "key4" "key5" "key6" ) 

Array3 =diff(Array1, Array2)

Array3 ideally should be :
Array3=( "key7" "key8" "key9" "key10" )

Solution

  • If you strictly want Array1 - Array2, then

    Array1=( "key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10" )
    Array2=( "key1" "key2" "key3" "key4" "key5" "key6" )
    
    Array3=()
    for i in "${Array1[@]}"; do
        skip=
        for j in "${Array2[@]}"; do
            [[ $i == $j ]] && { skip=1; break; }
        done
        [[ -n $skip ]] || Array3+=("$i")
    done
    declare -p Array3
    

    Runtime might be improved with associative arrays, but I personally wouldn't bother. If you're manipulating enough data for that to matter, shell is the wrong tool.


    For a symmetric difference like Dennis's answer, existing tools like comm work, as long as we massage the input and output a bit (since they work on line-based files, not shell variables).

    Here, we tell the shell to use newlines to join the array into a single string, and discard tabs when reading lines from comm back into an array.

    $ oldIFS=$IFS IFS=$'\n\t'
    $ Array3=($(comm -3 <(echo "${Array1[*]}") <(echo "${Array2[*]}")))
    comm: file 1 is not in sorted order
    $ IFS=$oldIFS
    $ declare -p Array3
    declare -a Array3='([0]="key7" [1]="key8" [2]="key9" [3]="key10")'
    

    It complains because, by lexographical sorting, key1 < … < key9 > key10. But since both input arrays are sorted similarly, it's fine to ignore that warning. You can use --nocheck-order to get rid of the warning, or add a | sort -u inside the <(…) process substitution if you can't guarantee order&uniqueness of the input arrays.