Search code examples
arraysbashshelldd

Bash shell scripting: How to replace characters at specific byte offsets


I'm looking to replace characters at specific byte offsets.

Here's what is provided: An input file that is simple ASCII text. An array within a Bash shell script, each element of the array is a numerical byte-offset value.

The goal: Take the input file, and at each of the byte-offsets, replace the character there with an asterisk.

So essentially the idea I have in mind is to somehow go through the file, byte-by-byte, and if the current byte-offset being read is a match for an element value from the array of offsets, then replace that byte with an asterisk.

This post seems to indicate that the dd command would be a good candidate for this action, but I can't understand how to perform the replacement multiple times on the input file.

Input file looks like this:

00000
00000
00000

The array of offsets looks this:

offsetsArray=("2" "8" "9" "15")

The output file's desired format looks like this:

0*000
0**00
00*00

Any help you could provide is most appreciated. Thank you!


Solution

  • Please check my comment about about newline offset. Assuming this is correct (note I have changed your offset array), then I think this should work for you:

    #!/bin/bash
    
    read -r -d ''
    offsetsArray=("2" "8" "9" "15")
    txt="${REPLY}"
    for i in "${offsetsArray[@]}"; do
        txt="${txt:0:$i-1}*${txt:$i}"
    done
    printf "%s" "$txt"
    

    Explanation:

    • read -d '' reads the whole input (redirected file) in one go into the $REPLY variable. If you have large files, this can run you out of memory.
    • We then loop through the offsets array, one element at a time. We use each index i to grab i-1 characters from the beginning of the string, then insert a * character, then add the remaining bytes from offset i. This is done with bash parameter expansion. Note that while your offsets are one-based, strings use zero-based indexing.

    In use:

    $ ./replacechars.sh < input.txt
    0*000
    0**00
    00*00
    $ 
    

    Caveat:

    This is not really a very efficient solution, as it causes the sting containing the whole file to be copied for every offset. If you have large files and/or a large number of offsets, then this will run slowly. If you need something faster, then another language that allows modification of individual characters in a string would be much better.