Search code examples
bashshellreplacefindgnupg

bash script: find content in file between specific lines and run command on content, replace old content with the output of command


I'm a real newb at scripting, only made real simple scripts before with some vars, ifs, simple grep, awk and so on commands.

Q: I have a few thousand files (emails) with cleartext and (sometimes) several independent sections of GPG encrypted text, something like this:

several lines of
cleartext stuff (more specifically: email headers)

-----BEGIN PGP MESSAGE-----
RTDHNRFSGNRTDHNRFSGNRTDHNRFSGN
RTDHNRFSGNRTDHNRFSGNRTDHNRFSGN
-----END PGP MESSAGE-----

some more lines
of cleartext

-----BEGIN PGP MESSAGE-----
WPGLUFPJUWPGLUFPJUWPGLUFPJU
WPGLUFPJUWPGLUFPJUWPGLUFPJU
-----END PGP MESSAGE-----

I'm trying to make a (preferably) bash script that goes through all files in a folder, find each instance of GPG encrypted text, decrypt it, and replace the old encrypted text with the decrypted text, then save the file. So that when the script is done the above hypothetical file looks like this:

several lines of
cleartext stuff (more specifically: email headers)

decrypted message #1

some more lines
of cleartext

decrypted message #2

When trying to just use GPG to decrypt the files GPG will skip all the cleartext stuff and just output the first decrypted message.

So I need something like a while loop I think, to independently find all instances that start with "-----BEGIN PGP MESSAGE-----" and end with "-----END PGP MESSAGE-----" and use the GPG command on that, then replace that instance with the output of the GPG command. And then continue to the next instance of encrypted text.

So far I just have these few lines, but they obviously don't properly do what I want. I don't want to have to use the script on each individual file. And I don't want to use a temp file, I guess there's a much better way to do all of this.

#!/bin/bash

TEMPFILE="${1}.tmp"

## grep only the relevant gpg lines to decrypt.
## this will output ALL encrypted instances to $TEMPFILE
sed -n '/^-----BEGIN PGP MESSAGE/,/^-----END PGP MESSAGE/p' "$1" > "$TEMPFILE"

## decrypt. this will only give me the decrypted output
## of the first encrypted instance in $TEMPFILE.
## and I don't know how to shove this into the proper place in the original file.
gpg --batch -d --no-tty --output "${1}.dc.eml" "$TEMPFILE"

## remove $TEMPFILE
rm "$TEMPFILE"

Here is my made up scripting language hopefully showing a better explanation of what I want to do:

for all files in folder; do
    while i can find an instance of "-----BEGIN PGP" to "-----END PGP"; do
        command: gpg decrypt > $tempvar
        command: replace the instance of "-----BEGIN PGP" to "-----END PGP" with $tempvar
    end while
end for

This is probably pretty simple to achieve (I hope) but I've been at this decryption dilemma for days now and I can't properly figure out how to do it. Any help or hints towards the right direction will be of great help to me.

EDIT: final code, thanks to glenn jackman! :

for file in *; do
    in_pgp_section=false
    pgp_text=""

    while IFS= read -r line; do
        if [[ $line == *BEGIN\ PGP\ MESSAGE* ]]; then
            in_pgp_section=true
        fi

        if ! $in_pgp_section; then
            printf "%s" "$line"
            continue
        fi

        pgp_text+="$line"$'\n'

        if [[ $line == *END\ PGP\ MESSAGE* ]]; then
            printf "%s" "$pgp_text" | gpg --batch -d --no-tty --use-agent
            in_pgp_section=false
            pgp_text=""
        fi
    done < "$file" > "$file.decrypted"
done

Solution

  • untested

    for file in *; do
        in_pgp_section=false
        pgp_text=""
    
        while read line; do
            if [[ $line == "-----BEGIN PGP MESSAGE-----" ]]; then
                in_pgp_section=true
            fi
    
            if ! $in_pgp_section; then
                echo "$line"
                continue
            fi
    
            pgp_text+="$line"$'\n'
    
            if [[ $line == "-----END PGP MESSAGE-----" ]]; then
                printf "%s" "$pgp_text" | gpg -d
                in_pgp_section=false
                pgp_text=""
            fi
        done < "$file" > "$file.decrypting"
    
        ln "$file" "$file.encrypted"  &&
        mv "$file.decrypting" "$file"
    done
    

    This should decrypt all the PGP section for all the files in the current directory, and leave the original file with a ".encrypted" extension