I am trying to append '.tsv' to the end of a column of text in a file.
You can do this easily with sed 's|$|.tsv|' myfile.txt
However, this is not working for my file, and I am trying to figure out why and how to fix it so that this works.
The column I want to edit looks like this:
$ cut -f12 chickspress.tsv | sort -u | head
Adipose_proteins
Adrenal_gland
Cerebellum
Cerebrum
Heart
Hypothalamus
Ovary
Sciatic_nerve
Testis
Tissue
But when I try to use sed
, the result comes out wrong:
$ cut -f12 chickspress.tsv | sort -u | sed -e 's|$|.tsv|'
.tsvose_proteins
.tsvnal_gland
.tsvbellum
.tsvbrum
.tsvt
.tsvthalamus
.tsvy
.tsvtic_nerve
.tsvis
.tsvue
.tsvey
.tsvr
.tsv
.tsvreas
.tsvoral_muscle
.tsventriculus
the .tsv
is supposed to be at the end of the line, not the front.
I thought there might be some whitespace error, so I tried this (macOS):
$ cut -f12 chickspress.tsv | sort -u | cat -ve
Adipose_proteins^M$
Adrenal_gland^M$
Cerebellum^M$
Cerebrum^M$
Heart^M$
Hypothalamus^M$
Ovary^M$
Sciatic_nerve^M$
Testis^M$
Tissue^M$
kidney^M$
liver^M$
lung^M$
pancreas^M$
pectoral_muscle^M$
proventriculus^M$
This ^M
does not look right, its not present in my other files, but I am not sure what it is representing here or how to fix it or just get this sed
command to work around it.
I produced this file using Python's csv.DictWriter
in a script which I've used many times in the past but never noticed this error coming from its output before. Run on macOS in this case.
EDIT: As per Ed's comment, in case you want to remove carriage returns at last of lines only then following may help.
awk '{sub(/\r$/,"")} 1' Input_file > temp_file && mv temp_file Input_file
OR
sed -i.bak '#s#\r$##' Input_file
Remove the control M characters by doing following and then try your command.
tr -d '\r' < Input_file > temp_file && mv temp_file Input_file
Or if you have dos2unix
utility in your system you could use that too for removing these characters.
With awk
:
awk '{gsub(/\r/,"")} 1' Input_file > temp_file && mv temp_file Input_file
With sed
:
sed -i.bak 's#\r##g' Input_file