I have a .CSV file when I check for the special characters in the file using the command cat -vet filename.csv
i get very lengthy lines with ^@, ^I^@ and ^@^M^ characters in between each alphabet in all of the records. I checked the file type by using the command
file filename.csv
I get the output as
filename.csv: Little-endian UTF-16 Unicode English character data, with very long lines, with CRLF, CR line terminators
. I have a script to remove the control M (^M) from the file, whose output returns me an error saying: : cannot execute binary file.
I know that ^I represent a tab. I have a script to convert ^I to comma delimited file but Can anyone help me format the file with respect to the error and also ^@.
If your input really is UTF-16, then you should use iconv
to convert your file from utf16
to something less cumbersome:
iconv -f utf16 -t utf8 < filename.csv > filename-utf8.csv
But I think that file
got that wrong because of the zero bytes (displayed as ^@
) in there.
You should have a look at your file using sth like this to be sure of the contents:
xxd filename.csv | less
or
od -c filename.csv | less
in case you don't have xxd
installed. This should show more accurately than cat
what you've got there byte-by-byte.