Search code examples
bashshellsedshebcdic

BASH Script to find a string in a file by position, match, then modify that position and insert if it exists


I have several lines in a file (input.in) that may look like this (asterisks are not literal; added for emphasis):

200928,121546,00002,**0000004015K**,**0000000641}**,00102020
200928,121546,00002,**0000000227B**,**0000000970R**,84839923
200928,121546,00003,**0000001197A**,**0000000227B**,93877763

I need to be able to find the value of the last character in the forth and fifth element (or look at the position 31 and 43) to determine what the actual number should be and if it's positive or negative. The result should look like the following after modifications:

200928,121546,00002,-00000040152,-00000006410,00102020
200928,121546,00002,00000002272,-00000009709,84839923
200928,121546,00003,00000011971,00000002272,93877763
  • {ABCDEFGHI correspond to all positive field and subs are 0123456789
  • }JKLMNOPQR correspond to all negative field and subs are 0123456789

I'm able to get all the positive number conversions working correctly but I am having problems with the negative conversions.

My code looks sorta like this for getting the positive switches (This is a "packed field" conversion btw):

sed -i -E "s/^(.{$a})\{/\10/" input.in

This is for the { positive case where the sub will be 0.

Where $a is introduced by a for a in 30 42 do loop. I have no issues identifying and updating the last char for that string but I can't figure out how to only flip the negative values if the corresponding character is found. I was thinking something like looking at the entire group of 11 (4th and 5th element) and if the last char in that group is }JKLMNOPQR, insert - at the first position and replace }JKLMNOPQR with 0123456789. respectively. Stuck here though. Of course the objective is to update the file with the changes after subs have been completed.

Code sample:

    input="input.in"
        for a in 30 42
            do
                while IFS= read -r line
                do
                echo "${line:$a:1} found, converting"
                edbvalue=${line:$a:1}
                case $edbvalue in
                        {)
                        echo -n -e "{ being replaced with 0\n"
                        sed -i -E "s/^(.{$a})\{/\10/" input.in
                        ;;

                        A)
                        echo -n -e "A being replaced with 1\n"
                        sed -i -E "s/^(.{$a})A/\11/" input.in
                        ;;
                        .
                        .
                        .
                        R)
                        echo -n -e "R being replaced with 9\n"
                        sed -i -E "s/^(.{$a})R/\19/" input.in
                        ;;

                        *)
                        echo -n -e "no conversion needed\n"
                        ;;
                esac
                done < "$input"
            done
            

Solution

  • Rewriting the input file repeatedly is horrendously inefficient. You want to perform all the replacements in one go.

    sed is rather hard to read once you start doing nontrivial things, so I would recommend switching to Awk (or a proper modern scripting language like Python if you want to invest more into this).

    awk -F , 'BEGIN { OFS=FS
        pos = "{ABCDEFGHI"; neg = "}JKLMNOPQR";
        for (i=0; i<10; ++i) { p[substr(pos, i+1, 1)] = i; n[substr(neg, i+1, 1)] = i }
    }
    { for (i=4; i<=5; i++) {
        where = length($i)
        what = substr($i, where, 1)
        if (what ~ "^[" pos "]$") sign = ""
        else if (what ~ "^[" neg "]$") sign = "-"
        else print "Error: field " i " " $i " malformed" >"/dev/stderr"
        $i = sign substr($i, 1, where-1) (sign ? n[what] : p[what])
        }
    }1' input.in
    

    Demo: https://ideone.com/z8wK0V

    This isn't entirely obvious, but here's a quick breakdown.

    In the BEGIN block, we create two associative arrays, such that

    p["{"] = 0, n["}"] = 0
    p["A"] = 1, n["J"] = 1
    p["B"] = 2, n["K"] = 2
    p["C"] = 3, n["L"] = 3
    p["D"] = 4, n["M"] = 4
    p["E"] = 5, n["N"] = 5
    p["F"] = 6, n["O"] = 6
    p["G"] = 7, n["P"] = 7
    p["H"] = 8, n["Q"] = 8
    p["I"] = 9, n["R"] = 9
    

    (We also set OFS to FS so that Awk will print the output comma-separated, like it reads the input.)

    Down in the main block, we loop over fields 4 and 5, extracting the last character and mapping it to the corresponding entry from the correct one of the two arrays, and add a sign if warranted.

    This simply writes to standard output; save to a new file and move it back over the original input file, or if you have GNU Awk, explore its -i inplace option.

    If you really wanted to do this in sed, it offers a rather convenient y/{ABCDEFGHI/0123456789/ but picking apart the fields and then reassembling the line when you are done is not going to be pleasant.