Search code examples
sedcygwinhexasciioctal

Cygwin sed: search/replace one single unprintable char using hex, octal or decimal


sorry for the simple question, but I have gone blind for four days studying and trying, and can't seem to strike the right syntax.

Using sed on cygwin, I am trying to replace one single unprintable ASCII character with another single unprintable character.

Here is my source file, using UPPERCASE text [within square brackets] to denote the unprintable ascii character:

myfile.txt:

line one[LF]
line two[LF]
line three[LF]
[SUBSTITUTE][LF]
line four{LF]
line five[LF]
line six[LF]
.
.
.

I would like to replace the LFs with TABs.

Since LFs are hex 0A and tabs are hex 09 I have tried, basically, this:

sed -i 's/\x0A/\x09/g' myfile.txt

which changes nothing in the file.

Of course, I have tried different switches like -b, -e and -r, with brackets and without, with and without the /g, extra backslashes and no backslashes, octal and decimal notation, all the way to Elven runes, with absolutely no success.

I read some answers that used 'echo' instead of a file as the source, they just confused me and didn't work.

Other examples used 'cheats' like the actual word TAB, but they prevented me from learning the syntax using numerics, so I can apply it to other unprintable chars, not just TABs.

When I try the 'file' command, I get:

file myfile.txt
file.txt: data

So, of course I tried:

sed -i -t UTF-8 's/\x0A/\x09/g' myfile.txt

but my sed didn't support that -t option.

When I try this:

oc -c myfile.txt

the [LF] character I'm searching for shows up as :

\n

I have also tried \0D as my search term, no luck either.

If anyone wants to lend me a clue by showing the correct syntax I would be very grateful.

Thanks.


Solution

  • Thanks everyone, I'm grateful for people trying to help. If StackOverflow lets me, I will upvote each attempt to help.

    I'm answering my own question in hopes it helps someone else.

    I learned it's not quite true that sed cannot handle LFs. It can handle them, but only when it's writing them. Not when reading them.

    So, I couldn't completely do the job with sed, as I hoped. I like sed's in-place switch, which seems less messy than creating another file and thus appeals to my OCD.

    The format of my file was :

    Mary(LF)
    Smith(LF)
    (SUB)(LF)
    John(LF)
    Public(LF)
    (SUB)(LF)
    

    and I wanted a result of:

    Mary(TAB)Smith(LF)
    John(TAB)Public(LF)
    

    So, I wanted to change LF to TAB, and LF-SUB-LF to LF.

    I solved my problem by first using TR to change all LFs to TABs. Couldn't use sed for this.

    # change LFs to TABs ... so grep can later treat entire file as one line
    tr '\012' '\011' < comengo.extract.txt > comengo.extract.out
    mv comengo.extract.out comengo.extract.txt
    

    That way, sed can now treat the entire file as one line. sed only likes to treat files line-by-line, so I made the whole fine one single line.

    The second step was to let sed jump in, and make the changes I wanted. The gist of my question was "how do I represent non-printing ascii characters?".

    My previous attempts were failing because I was trying to use \x12 in the sed search string. Now that the LFs were replaced, I used an uninterrupted sequence of hex numbers.

    # changes (tab)(sub)(tab) to just (sub)
    sed -i 's/\x09\x1A\x09/\x1A/g'   comengo.extract.tx
    

    Then I restored LFs to the file by using sed, which can write LFs

    # (sub) to (tab)(lf)
    sed -i 's/\x1A/\x0A\x09/g'  comengo.extract.txt
    

    And that worked like a charm.