Search code examples
bashshellcompareexport-to-csvmd5sum

compare files in shell script with md5sum and create csv for the changed file


I am very much new to shell script and found a way about how to compare files using shell script while using md5sum.

I want to compare Options_old and Options_new files in shell script and identify the new Ticker field value added in the new file. For this new ticker field value I want to create CSV file.

For example if we compare Options_old and Options_new files and check the Options_new file there is new ticker field value 510051 2 C2.50 and 510052 2 P2.50 added and I want to create and print this value in CSV file.

Options_new.out.gz file

START-OF-FILE
PROGRAMNAME=getdata
DATEFORMAT=yyyymmdd
START-OF-FIELDS
TICKER
EXCH_CODE
END-OF-FIELDS
TIMESTARTED=Wed Feb 12 19:30:38 JST 2020
START-OF-DATA
510051 CH 02/26/20 C2.5 Equity|0|75|510051 2 C2.50|CH
510052 CH 02/26/20 P2.5 Equity|0|75|510052 2 P2.50|CH
510050 CH 02/26/20 C2.55 Equity|0|75|510050 2 C2.55|CH
510050 CH 02/26/20 P2.55 Equity|0|75|510050 2 P2.55|CH
END-OF-DATA
DATARECORDS=1140
TIMEFINISHED=Wed Feb 12 19:32:50 JST 2020
END-OF-FILE

Options_old.out.gz file

START-OF-FILE
PROGRAMNAME=getdata
DATEFORMAT=yyyymmdd
START-OF-FIELDS
TICKER
EXCH_CODE
END-OF-FIELDS
TIMESTARTED=Wed Feb 12 19:30:38 JST 2020
START-OF-DATA
510050 CH 02/26/20 C2.5 Equity|0|75|510050 2 C2.50|CH
510050 CH 02/26/20 P2.5 Equity|0|75|510050 2 P2.50|CH
510050 CH 02/26/20 C2.55 Equity|0|75|510050 2 C2.55|CH
510050 CH 02/26/20 P2.55 Equity|0|75|510050 2 P2.55|CH
END-OF-DATA
DATARECORDS=1140
TIMEFINISHED=Wed Feb 12 19:32:50 JST 2020
END-OF-FILE

I have started the code but not understood further how to compare the particular field and then generate csv file:

#!/bin/sh

OLD_PATH="/opt/old"
NEW_PATH="/opt/new"

FILES="${FILES} Options_new.out.gz Options_old.out.gz"

for FILE in `echo ${FILES}`
do
   MD5SUM_NEW=`md5sum ${OLD_PATH}/${FILE} | awk '{print $1}'`
   MD5SUM_OLD=`md5sum ${NEW_PATH}/${FILE} | awk '{print $1}'`

   if [ "${MD5SUM_NEW}" != "${MD5SUM_OLD}" ]; then
      echo "Found new Version of ${FILE}"
#currently i am comparing the data from the whole file but i want to compare the data only for the Ticker value in the both files

#here create new csv file with the new ticker value found in Options_new.out.gz file

   fi

exit ${EXIT}

Solution

  • Food for thought maybe runs to check if different, if so prints lines that have with the bits you indicated you wished to save to csv

    #!/bin/bash
    
    #Check if file are different then grep for word differ 
    #normally would spit out Files file2 and file1 differ
    # flags are -F fixed string, -w match only full words
    # -q quiet ie no output to stdout (screen)
    
    if $(diff -q "$2" "$1" | grep -Fwq "differ")
    then
        #create a var of the changed text, awk looking at 
        #begining of line to see if begins with > and then
        #output the full fine for awk to then select the 
        #vars you want
        changeSyn=$(diff file2 file1 | awk '$1 ~ /^ *>/' | awk '{print $2","$5","$7 }')
        #same again only for new vars
        addedSyn=$(diff file2 file1 | awk '$1 ~ /^ *</' | awk '{print $2","$5","$7 }')
        echo "$changeSyn"
        echo "$addedSyn"
    else
        echo "No change"
    fi