Search code examples
regexemail-headersprocmail

Using Procmail/Formail/Regex correct an error in email headers


I am trying to remove an unwanted character > appearing in the "From " line in the headers of some old archived emails such as ">From" and am unable to do so by rewriting the From line using the Procmail recipe

Error reproduced:

>From "[email protected]" Sat Dec  4 11:01:29 2004
Status: RO
From: "xxxxxx" <[email protected]>
Subject: Desktop Alert Utility
To: '[email protected]'; '[email protected]'
Date: Sat, 04 Dec 2004 05:31:29 +0000
MIME-Version: 1.0
Content-Type: multipart/mixed;
    boundary="--boundary-LibPST-iamunique-1531497257_-_-"

The following does not work:

:0 fhw
| formail -I">From " -a"From "

Even the following does not work:

:0 fhw
| formail -I">From "

What am I doing wrong? Will be happy to share any relevant information.

Note: Due to the unnecessary > before From in the first line of the email header, the mail client shows the email as with "No sender" and does not show other details in the summary view. It shows the whole message in the body.

I also tried

LC_ALL=C find . -type f -name ‘*.*’ -exec sed -i '' s/'>From'/'From'/ {} +

but it did not return the result needed.

I am running macOS Mojave.


New note: While my original question is answered below, the extended discussion of applying sed to achieve results have led to a new question at the link below:

Removing unwanted character from the first line of files in a “maildir”


Solution

  • > is not syntactically a valid header character, so I doubt you can persuade formail to treat it as one.

    Try writing a simple sed or Awk script to escape it instead.

    If the >From is always the first line of each file, try

    sed -i '' '1s/^>From/From/' *
    

    and if the files are not all in the current directory, maybe wrap that with

    find . -type d -execdir sh -c 'sed -i "" "1s/^>From/From/" *' \;
    

    to run it on all the subdirectories of the current directory.

    This assumes the file names will all fit on a single command line; if you get "Argument list too long", try

    printf '%s\n' * | xargs sed -i '' '1s/^>From/From/'
    

    or with find, try

    find . -type f -exec sed -i '' '1s/^>From/From/' {} +
    

    The printf variant is slightly brittle; if you can't get it to work because you have irregular file names with newlines in them etc, the find solution should not be hard to adapt to run in the current directory only (add -maxdepth 1 to prevent it from traversing subdirectories).

    In brief, some email servers will change every From at the beginning of a line in the body of a message into >From (or, with quoted-printable MIME encoding, =46rom; but this should be transparently converted back for display purposes when you view the message with a proper MIME client) - I'm guessing you have forwarded the entire mailbox inlined into a text/plain message so perhaps the easiest fix is to send it again from the original source, this time wrapped into a suitable MIME container so that it won't be mangled in transport (maybe wrap it into a .tar.gz and add that as a binary attachment).