Search code examples
vimsplitvigit-bash

VI shows control characters after splitting a large text file with git bash split


I used this thread to split a large text file into several smaller files. To split the file I use the following command in Git Bash:

split -l 80000 largeFile

I then want to edit each of the output files, but when I open them in VI, the output looks weird and I cannot properly edit the file. The output contains a lot of @ symbols and carets. I assume that these are control characters. See the following screenshot:

enter image description here

My questions are:

  • Why is the file displayed like this?
  • How can I properly edit the file in VI?

Solution

  • If you look closely, every second character is ^@, which inside Vim represent a null byte (cp. :help <Nul>). The letters in between are readable (USE [TIP_Update_...). So what we're looking at is a 16-bit encoding (i.e. two bytes for each character) of (mostly?) ASCII text; as the null byte is the second one, it is little endian.

    The first two characters (ÿþ) break the rule; this is a byte order mark that provides text editors with a hint what the encoding is. The way it is displayed, Vim instead thinks the text is in latin1 encoding.

    So, you're dealing with 16 bit UCS-2 encoded Unicode (ISO/IEC 10646-1) (name in Vim: ucs-2le; see :help encoding-values), but Vim doesn't detect them automatically.

    You can either

    • manually force the encoding via :help ++enc: :e! ++enc=ucs-2le
    • reconfigure Vim (:help 'fileencodings') to automatically detect these; actually, the default value includes ucs-bom and should detect these just fine.