Search code examples
bashawksedcarriage-return

sed '$' matching start of line instead of end


I am trying to append '.tsv' to the end of a column of text in a file.

You can do this easily with sed 's|$|.tsv|' myfile.txt

However, this is not working for my file, and I am trying to figure out why and how to fix it so that this works.

The column I want to edit looks like this:

$ cut -f12 chickspress.tsv | sort -u | head
Adipose_proteins
Adrenal_gland
Cerebellum
Cerebrum
Heart
Hypothalamus
Ovary
Sciatic_nerve
Testis
Tissue

But when I try to use sed, the result comes out wrong:

$ cut -f12 chickspress.tsv | sort -u | sed -e 's|$|.tsv|'
.tsvose_proteins
.tsvnal_gland
.tsvbellum
.tsvbrum
.tsvt
.tsvthalamus
.tsvy
.tsvtic_nerve
.tsvis
.tsvue
.tsvey
.tsvr
.tsv
.tsvreas
.tsvoral_muscle
.tsventriculus

the .tsv is supposed to be at the end of the line, not the front.

I thought there might be some whitespace error, so I tried this (macOS):

$ cut -f12 chickspress.tsv | sort -u | cat -ve
Adipose_proteins^M$
Adrenal_gland^M$
Cerebellum^M$
Cerebrum^M$
Heart^M$
Hypothalamus^M$
Ovary^M$
Sciatic_nerve^M$
Testis^M$
Tissue^M$
kidney^M$
liver^M$
lung^M$
pancreas^M$
pectoral_muscle^M$
proventriculus^M$

This ^M does not look right, its not present in my other files, but I am not sure what it is representing here or how to fix it or just get this sed command to work around it.

I produced this file using Python's csv.DictWriter in a script which I've used many times in the past but never noticed this error coming from its output before. Run on macOS in this case.


Solution

  • EDIT: As per Ed's comment, in case you want to remove carriage returns at last of lines only then following may help.

    awk '{sub(/\r$/,"")} 1' Input_file > temp_file && mv temp_file Input_file
    

    OR

    sed -i.bak '#s#\r$##' Input_file
    

    Remove the control M characters by doing following and then try your command.

    tr -d '\r' < Input_file > temp_file  && mv temp_file Input_file
    

    Or if you have dos2unix utility in your system you could use that too for removing these characters.

    With awk:

    awk '{gsub(/\r/,"")} 1' Input_file > temp_file && mv temp_file Input_file
    

    With sed:

    sed -i.bak 's#\r##g' Input_file