I used this thread to split a large text file into several smaller files. To split the file I use the following command in Git Bash:
split -l 80000 largeFile
I then want to edit each of the output files, but when I open them in VI, the output looks weird and I cannot properly edit the file. The output contains a lot of @
symbols and carets. I assume that these are control characters. See the following screenshot:
My questions are:
If you look closely, every second character is ^@
, which inside Vim represent a null byte (cp. :help <Nul>
). The letters in between are readable (USE [TIP_Update_...
). So what we're looking at is a 16-bit encoding (i.e. two bytes for each character) of (mostly?) ASCII text; as the null byte is the second one, it is little endian.
The first two characters (ÿþ
) break the rule; this is a byte order mark that provides text editors with a hint what the encoding is. The way it is displayed, Vim instead thinks the text is in latin1
encoding.
So, you're dealing with 16 bit UCS-2 encoded Unicode (ISO/IEC 10646-1) (name in Vim: ucs-2le
; see :help encoding-values
), but Vim doesn't detect them automatically.
You can either
:help ++enc
: :e! ++enc=ucs-2le
:help 'fileencodings'
) to automatically detect these; actually, the default value includes ucs-bom
and should detect these just fine.