Search code examples
linuxbashautomationscriptingsh

how to detect the differences between 2 files and repeat the comparison when special character is found?


I need to detect ALL the differences between 2 files and repeat the comparison when special character is found, and print them in 3rd file.

If file1 is:

a
b
c
d

and file2 is:

1:
b
d
--
2:
a
--
3:
c
a

then the expected output is:

1:
a
c
--
2:
b
c
d
--
3:
b
d

do you have any suggestion? everything i tried typed the difference of 1 file not both.

my code:

#!/bin/bash

file1=file1
file2=file2
output_file=Filee
#!/bin/bash

# Compare the files and store the differences in a temporary file
diff_file=$(mktemp)
diff --changed-group-format='%<' --unchanged-group-format='' "$file1" "$file2" > "$diff_file"

# Process the differences and write them to the output file
group_number=1
current_group=""
while IFS= read -r line; do
    if [[ $line == -- ]]; then
        if [[ -n $current_group ]]; then
            echo "--" >> "$output_file"
            ((group_number++))
        fi
    else
        if [[ -z $current_group ]]; then
            echo "$group_number:" >> "$output_file"
        fi
        echo "$line" >> "$output_file"
        current_group=$group_number
    fi
done < "$diff_file"

# Remove the temporary file
rm "$diff_file"

echo "Comparison completed. Results saved to $output_file"

Solution

  • I'm looking forward to your solution. I wonder how you attempted to solve it :-)

    I've made something that uses regex and works for strings (without spaces) as well. It uses regex to search patterns and skips hits from file2. The script has comments that explain it.

    #! /bin/bash
    
    # Disable history, aka `!`
    set +H
    
    
    readonly ID=(1 2 3)
    readonly FILE_IN=file1
    readonly FILE_TST=file2
    readonly FILE_OUT=file3
    
    get() {
            local id=${1?No ID given to ${FUNCNAME[0]}}
            local d=${2?No delimiter given to ${FUNCNAME[0]}}
            local f=${3?No file given to ${FUNCNAME[0]}}
            # ^                     - start of line (file due to -0 option)
            # (.|\n)*$id:\n         - Skip everything until $id: is found
            # (?:(?!$d)(.|\n))*     - Match everything until delimiter
            # ($d|$)                - Delimiter or end of line (file in this case)
            # (.|\n)*               - The rest (if any)
            perl -0pe "s/^(.|\n)*$id:\n((?:(?!$d)(.|\n))*)($d|$)(.|\n)*/\2\n/" $f
    }
    
    # Truncate file
    echo -ne "" > ${FILE_OUT}
    for i in ${ID[@]}; do
    
            # Grep data from "get"
            data=($(get ${i} '--' "${FILE_TST}" ))
    
            # Build regex to delete entry per entry in file instead
            # of one long regexp.
            data="$(echo ${data[@]} | tr ' ' '\n'; echo)"
            regxp="$(sed -r 's/^/\//; s/$/\/d;/' <<< "${data}")"
            
            # Populate file
            cat >> ${FILE_OUT} <<-EOF
    ${i}:
    $(sed -r "${regxp}" ${FILE_IN})
    EOF
    done
    
    # Show output
    echo "File ${FILE_OUT} finished"
    
    exit 0
    

    file1

    $ cat file1 
    ab
    bc
    cd
    de
    

    file2

    1:
    bc
    de
    --
    2:
    ab
    --
    3:
    cd
    ab
    

    file3

    1:
    ab
    cd
    2:
    bc
    cd
    de
    3:
    bc
    de
    

    p.s. I'd rather have an up-vote. That silver bash badge is coming my way \0/